Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

trouble with RegExp script

Status
Not open for further replies.

3inen

Technical User
May 26, 2005
51
US
Hi! Guys
Could you help me straighten this script?
Thanks in advance

the script i have sofar....
#!/usr/bin/perl -w

open(DATA,"<in.txt") || die("Missing File");

while ($inc = <DATA>) {
@header = ($inc =~ /^#.*/g);
($val) = ($inc =~ /([ABC] GO:.*)/);
foreach $io (@header) {
print "$io $val\n";
}
}


my sample input data...
#J1_67_L_TC_O3-Z05-SP6.faa 1 2005
A GO:539284565 ALLIANCE AXIS

B GO:74356886 principal and earned interest
B GO:74356886 principal and earned interest
C GO:89068867 INTRAMURAL INVESTIGATION

#C1_67_L_TC_P3-R01-SP6.fca 1 2001
A GO:80201220 principal and earned interest
B GO:45639012 cost of credit
B GO:45639012 cost of credit
B GO:45639012 cost of credit

#T1_67_L_TC_N3-H02-SP6.fas 1 2004
B GO:12324566 automated clearinghouse (ACH)
C GO:54923950 depositor's savings account


my desired output as tab delimited text....
#J1_67_L_TC_O3-Z05-SP6.faa 1 2005 A GO:539284565 ALLIANCE AXIS B GO:74356886 principal and earned interest C GO:89068867 INTRAMURAL INVESTIGATION

#C1_67_L_TC_P3-R01-SP6.fca 1 2001 A GO:80201220 principal and earned interest B GO:45639012 cost of credit

#T1_67_L_TC_N3-H02-SP6.fas 1 2004 B GO:12324566 automated clearinghouse (ACH) C GO:54923950 depositor's savings account


 
no regexp's necessary:

Code:
#!perl 
use strict; 
use warnings;
#use Data::Dump qw(dump); 

my @AofH = ();
my $i = -1; 
while(<DATA>) {
   chomp;
   next if ($_ eq "");
   ($i++,$AofH[$i]{header} = $_) if (index($_,'#') == 0);
   (%{$AofH[$i]{GO}{$_}} = $_)   if (index($_,'GO:') > -1);
}

#print dump(@AofH);

my $yourfile = 'path/to/file.txt'";
open(FILE, ">$yourfile") or die "$!";

foreach my $i (0 .. $#AofH) {
   print FILE "$AofH[$i]{header}";
   my $string = "";
   foreach my $keys (sort keys %{$AofH[$i]{GO}}) {
      $string .= "\t$keys";
   }
   print FILE "$string\n";
}
close (FILE);

__DATA__
#J1_67_L_TC_O3-Z05-SP6.faa  1 2005
A GO:539284565 ALLIANCE AXIS

B GO:74356886 principal and earned interest
B GO:74356886 principal and earned interest
C GO:89068867 INTRAMURAL INVESTIGATION

#C1_67_L_TC_P3-R01-SP6.fca  1 2001
A GO:80201220 principal and earned interest
B GO:45639012 cost of credit
B GO:45639012 cost of credit
B GO:45639012 cost of credit

#T1_67_L_TC_N3-H02-SP6.fas  1 2004
B GO:12324566 automated clearinghouse (ACH)
C GO:54923950 depositor's savings account

of course there will be a variety of ways to do this so wait and see what others have to suggest.
 
when you are looking for the simple presence of a sub string (such as "#" or "GO:") within a string using index() should be better. regexp's tend to get overused for simple tasks such as this.
 
3inen, assuming that the 'records' for each header/label (ex.:

A GO:539284565 ALLIANCE AXIS
B GO:74356886 principal and earned interest)

are grouped/sorted like your sample data shows, you don't really even need to use a data structure. Something like this could work:

Code:
my $last;
while (<DATA>) {
    next if (/^\s*$/); # Skip blanks
    chomp;
    if (/\s*#/) {
        print ($last ? "\n\n$_" : $_);
        $last = "";
    } else {
        if ($_ ne $last) {
            print "\t$_";
            $last = $_;
        }
    }
}

Also, for the output, you want it to be tab delimited but I don't really see where your tabs are. I assumed it was between the headers/labels and the records.
 
rharsh and KevinADC
thank you guys for the help and the suggestions.

rharsh you are right about the tab position. the script runs as expected. sorry i forgot to metion that in my posting.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top