Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

help with Parsing a text file into multiple text files 2

Status
Not open for further replies.

compchem

Technical User
Joined
Aug 2, 2007
Messages
2
Location
US
Hello,
I'm brand new to perl, and am learning to automate certain actions using it.
I feel my question is simple, but i'm having a difficult time finding answers in books and google, so i've joined this forum.

I have a large text file that looks like this:

MOLECULE:name_of_molecule.ZMAT ................................
.....
.........
.............
.................
.....................


END


MOLECULE:name_of_molecule2.ZMAT
.
.
.
END
.
.
.


Basically, there are 148 separate files I'm trying to grab from this.
Each file, i need the name to be - name_of_molcule.ZMAT -
But within each new file, I need from MOLECULE all the way to before END to be copied into that specific name_of_molecule.ZMAT.

All name_of_molecules are different, and all /MOLECULE:(.*)END/ are different.

Here is what i've done with this so far:
#!/usr/bin/perl -w

open (GEOS, "G2_ZMATS") or die "Can't open GEOS: $!\n";
while (<GEOS>) {
chop;
if(/MOLECULE:((.*)\.ZMAT)./){
$name=$1;
$outfile="$name\n";
open (RESULTS, ">$outfile") or die "Can't open outfile: $!\n";
if(/(MOLECULE:(.*)END)/s){
$geo=$1;
print RESULTS "$geo";
}
}
close (RESULTS);
}
close (GEOS);

this code does make 148 files that are named perfectly, but the files contain no information on the inside.

I appreciate any help on this matter, as I am a new chem grad student just learning how to eventually make life easier.

Thanks,
Tom
 
You were close. Just needed to continue feeding in data from GEOS:
Code:
[gray]#!/usr/bin/perl -w[/gray]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$srcfile[/blue] = [red]'[/red][purple]G2_ZMATS[/purple][red]'[/red][red];[/red]

[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url][red]([/red]GEOS, [blue]$srcfile[/blue][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Can't open [blue]$srcfile[/blue]: [blue]$![/blue][/purple][red]"[/red][red];[/red]

[olive][b]while[/b][/olive] [red]([/red]<GEOS>[red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red];[/red]
	
	[olive][b]if[/b][/olive] [red]([/red][red]/[/red][purple]MOLECULE:(.*[purple][b]\.[/b][/purple]ZMAT)[/purple][red]/[/red][red])[/red][red]{[/red]
		[black][b]my[/b][/black] [blue]$molecule[/blue] = [blue]$1[/blue][red];[/red]
		
		[black][b]open[/b][/black][red]([/red]RESULTS, [red]"[/red][purple]>[blue]$molecule[/blue][/purple][red]"[/red][red])[/red] or [black][b]die[/b][/black] [red]"[/red][purple]Can't open [blue]$molecule[/blue]: [blue]$![/blue][/purple][red]"[/red][red];[/red]
		
		[olive][b]while[/b][/olive] [red]([/red]<GEOS>[red])[/red] [red]{[/red]
			[olive][b]last[/b][/olive] [olive][b]if[/b][/olive] [red]/[/red][purple]^END$[/purple][red]/[/red][red];[/red]
			[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] RESULTS [blue]$_[/blue][red];[/red]
		[red]}[/red]

		[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url][red]([/red]RESULTS[red])[/red][red];[/red]
	[red]}[/red]
[red]}[/red]
[black][b]close[/b][/black][red]([/red]GEOS[red])[/red][red];[/red]

- Miller
 
The reason your code is failing is it is structured like this: (pseudo code)
Code:
For each line in GEOS
  If it is the line indicating the start of a molecule
    create the molecule file
    if it is the line indicating the end of a molecule
      [b]write the name of the molecule to the molecule file[/b]
    endif
  endif
  close molecule file
endfor
It will never get to the line marked in bold unless the line matches the patterns for BOTH the start line and end line of a molecule. Even then, it won't write what you wanted, but simply the molecule name.

Code:
[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url] [red]([/red]GEOS, [red]"[/red][purple]G2_ZMATS.txt[/purple][red]"[/red][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Can't open GEOS: [blue]$![/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[olive][b]while[/b][/olive] [red]([/red]<GEOS>[red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$in[/blue] = [red]/[/red][purple]MOLECULE:(.*[purple][b]\.[/b][/purple]ZMAT)[/purple][red]/[/red] .. [red]/[/red][purple]^END$[/purple][red]/[/red][red];[/red]
	[olive][b]if[/b][/olive] [red]([/red] [red]/[/red][purple]MOLECULE:(.*[purple][b]\.[/b][/purple]ZMAT)[/purple][red]/[/red] [red])[/red] [red]{[/red]
	  [black][b]open[/b][/black] [red]([/red]RESULTS, [red]"[/red][purple]>[blue]$1[/blue][/purple][red]"[/red][red])[/red] or [black][b]die[/b][/black] [red]"[/red][purple]Can't open outfile: [blue]$![/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
	[red]}[/red] [olive][b]elsif[/b][/olive] [red]([/red][red]/[/red][purple]^END$[/purple][red]/[/red][red])[/red] [red]{[/red]
		[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url] RESULTS[red];[/red]
	[red]}[/red] [olive][b]elsif[/b][/olive] [red]([/red][blue]$in[/blue][red])[/red] [red]{[/red]
		[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] RESULTS[red];[/red]
	[red]}[/red]
[red]}[/red]
[black][b]close[/b][/black] GEOS[red];[/red]

This uses a generic END as the terminator which you described first.

Change references from ^END$ to MOLECULE:.*END if each molecule has it's own particular end tag (which your later descriptions seem to support).
 
Thank you so much guys, I see where I went wrong. Life is great again :)
 
if you are going to be working with DNA stuff, you may want to look into Bioperl:


------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top