help with Parsing a text file into multiple text files 2

compchem · Aug 2, 2007

Hello,
I'm brand new to perl, and am learning to automate certain actions using it.
I feel my question is simple, but i'm having a difficult time finding answers in books and google, so i've joined this forum.

I have a large text file that looks like this:

MOLECULE:name_of_molecule.ZMAT ................................
.....
.........
.............
.................
.....................

END

MOLECULE:name_of_molecule2.ZMAT
.
.
.
END
.
.
.

Basically, there are 148 separate files I'm trying to grab from this.
Each file, i need the name to be - name_of_molcule.ZMAT -
But within each new file, I need from MOLECULE all the way to before END to be copied into that specific name_of_molecule.ZMAT.

All name_of_molecules are different, and all /MOLECULE

.*)END/ are different.

Here is what i've done with this so far:
#!/usr/bin/perl -w

open (GEOS, "G2_ZMATS") or die "Can't open GEOS: $!\n";
while (<GEOS>) {
chop;
if(/MOLECULE

(.*)\.ZMAT)./){
$name=$1;
$outfile="$name\n";
open (RESULTS, ">$outfile") or die "Can't open outfile: $!\n";
if(/(MOLECULE

.*)END)/s){
$geo=$1;
print RESULTS "$geo";
}
}
close (RESULTS);
}
close (GEOS);

this code does make 148 files that are named perfectly, but the files contain no information on the inside.

I appreciate any help on this matter, as I am a new chem grad student just learning how to eventually make life easier.

Thanks,
Tom

MillerH · Aug 2, 2007

You were close. Just needed to continue feeding in data from GEOS:

Code:

[gray]#!/usr/bin/perl -w[/gray]

[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$srcfile[/blue] = [red]'[/red][purple]G2_ZMATS[/purple][red]'[/red][red];[/red]

[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url][red]([/red]GEOS, [blue]$srcfile[/blue][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Can't open [blue]$srcfile[/blue]: [blue]$![/blue][/purple][red]"[/red][red];[/red]

[olive][b]while[/b][/olive] [red]([/red]<GEOS>[red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/chomp.html][black][b]chomp[/b][/black][/url][red];[/red]
	
	[olive][b]if[/b][/olive] [red]([/red][red]/[/red][purple]MOLECULE:(.*[purple][b]\.[/b][/purple]ZMAT)[/purple][red]/[/red][red])[/red][red]{[/red]
		[black][b]my[/b][/black] [blue]$molecule[/blue] = [blue]$1[/blue][red];[/red]
		
		[black][b]open[/b][/black][red]([/red]RESULTS, [red]"[/red][purple]>[blue]$molecule[/blue][/purple][red]"[/red][red])[/red] or [black][b]die[/b][/black] [red]"[/red][purple]Can't open [blue]$molecule[/blue]: [blue]$![/blue][/purple][red]"[/red][red];[/red]
		
		[olive][b]while[/b][/olive] [red]([/red]<GEOS>[red])[/red] [red]{[/red]
			[olive][b]last[/b][/olive] [olive][b]if[/b][/olive] [red]/[/red][purple]^END$[/purple][red]/[/red][red];[/red]
			[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] RESULTS [blue]$_[/blue][red];[/red]
		[red]}[/red]

		[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url][red]([/red]RESULTS[red])[/red][red];[/red]
	[red]}[/red]
[red]}[/red]
[black][b]close[/b][/black][red]([/red]GEOS[red])[/red][red];[/red]

- Miller

brigmar · Aug 2, 2007

The reason your code is failing is it is structured like this: (pseudo code)

Code:

For each line in GEOS
  If it is the line indicating the start of a molecule
    create the molecule file
    if it is the line indicating the end of a molecule
      [b]write the name of the molecule to the molecule file[/b]
    endif
  endif
  close molecule file
endfor

It will never get to the line marked in bold unless the line matches the patterns for BOTH the start line and end line of a molecule. Even then, it won't write what you wanted, but simply the molecule name.

Code:

[url=http://perldoc.perl.org/functions/open.html][black][b]open[/b][/black][/url] [red]([/red]GEOS, [red]"[/red][purple]G2_ZMATS.txt[/purple][red]"[/red][red])[/red] or [url=http://perldoc.perl.org/functions/die.html][black][b]die[/b][/black][/url] [red]"[/red][purple]Can't open GEOS: [blue]$![/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
[olive][b]while[/b][/olive] [red]([/red]<GEOS>[red])[/red] [red]{[/red]
	[url=http://perldoc.perl.org/functions/my.html][black][b]my[/b][/black][/url] [blue]$in[/blue] = [red]/[/red][purple]MOLECULE:(.*[purple][b]\.[/b][/purple]ZMAT)[/purple][red]/[/red] .. [red]/[/red][purple]^END$[/purple][red]/[/red][red];[/red]
	[olive][b]if[/b][/olive] [red]([/red] [red]/[/red][purple]MOLECULE:(.*[purple][b]\.[/b][/purple]ZMAT)[/purple][red]/[/red] [red])[/red] [red]{[/red]
	  [black][b]open[/b][/black] [red]([/red]RESULTS, [red]"[/red][purple]>[blue]$1[/blue][/purple][red]"[/red][red])[/red] or [black][b]die[/b][/black] [red]"[/red][purple]Can't open outfile: [blue]$![/blue][purple][b]\n[/b][/purple][/purple][red]"[/red][red];[/red]
	[red]}[/red] [olive][b]elsif[/b][/olive] [red]([/red][red]/[/red][purple]^END$[/purple][red]/[/red][red])[/red] [red]{[/red]
		[url=http://perldoc.perl.org/functions/close.html][black][b]close[/b][/black][/url] RESULTS[red];[/red]
	[red]}[/red] [olive][b]elsif[/b][/olive] [red]([/red][blue]$in[/blue][red])[/red] [red]{[/red]
		[url=http://perldoc.perl.org/functions/print.html][black][b]print[/b][/black][/url] RESULTS[red];[/red]
	[red]}[/red]
[red]}[/red]
[black][b]close[/b][/black] GEOS[red];[/red]

This uses a generic END as the terminator which you described first.

Change references from ^END$ to MOLECULE:.*END if each molecule has it's own particular end tag (which your later descriptions seem to support).

compchem · Aug 2, 2007

Thank you so much guys, I see where I went wrong. Life is great again

KevinADC · Aug 2, 2007

if you are going to be working with DNA stuff, you may want to look into Bioperl:

http://www.bioperl.org

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

help with Parsing a text file into multiple text files 2

compchem

Technical User

MillerH

Programmer

brigmar

Programmer

compchem

Technical User

KevinADC

Technical User

Similar threads

Part and Inventory Search

Sponsor