Foreach Reading within file

sdslrn123 · May 30, 2006

Okay this is complex. But, if you have the patience just to nudge me in the right direction. I know I am close...

I have a file. Format of file is:

Line starts with ST: SIGNAL_NAME1, SIGNAL_NAME2
.
.
.
Line starts with YZ: WO RD 1, WO RD 2, WO RD 3, WO RD 4,
Line starts with YZ: WO RD 5, WO RD 6, WO RD 7, WO RD 8.
.
.
.
[[although lines below are beginning of new data of same experiment they are joined with the above one]]
Line starts with ST: SIGNAL_NAME3, SIGNAL_NAME4
.
.
.
Line starts with YZ:new WORD 9,new WORD 10,new WORD 11,new WORD 12,
Line starts with YZ:new WORD 13,new WORD 14,new WORD 15,new WORD 16.
.
.
.
and so on...

I need to change this file to the following format:
SIGNAL_NAME 1 WORD 1
SIGNAL_NAME 1 WORD 2
SIGNAL_NAME 1 WORD 3
SIGNAL_NAME 1 WORD 4
SIGNAL_NAME 1 WORD 5
SIGNAL_NAME 1 WORD 6
SIGNAL_NAME 1 WORD 7
SIGNAL_NAME 1 WORD 8
SIGNAL_NAME 2 WORD 1
SIGNAL_NAME 2 WORD 2
SIGNAL_NAME 2 WORD 3
SIGNAL_NAME 2 WORD 4
SIGNAL_NAME 2 WORD 5
SIGNAL_NAME 2 WORD 6
SIGNAL_NAME 2 WORD 7
SIGNAL_NAME 2 WORD 8
SIGNAL_NAME 3 WORD 9
SIGNAL_NAME 3 WORD 10
SIGNAL_NAME 3 WORD 11
SIGNAL_NAME 3 WORD 12
SIGNAL_NAME 3 WORD 13
SIGNAL_NAME 3 WORD 14
SIGNAL_NAME 3 WORD 15
SIGNAL_NAME 3 WORD 16
SIGNAL_NAME 4 WORD 9
SIGNAL_NAME 4 WORD 10
SIGNAL_NAME 4 WORD 11
SIGNAL_NAME 4 WORD 12
SIGNAL_NAME 4 WORD 13
SIGNAL_NAME 4 WORD 14
SIGNAL_NAME 4 WORD 15
SIGNAL_NAME 4 WORD 16

But, I am getting...

SIGNAL_NAME 1 WORD 1
SIGNAL_NAME 1 WORD 2
SIGNAL_NAME 1 WORD 3
SIGNAL_NAME 1 WORD 4
SIGNAL_NAME 2 WORD 1
SIGNAL_NAME 2 WORD 2
SIGNAL_NAME 2 WORD 3
SIGNAL_NAME 2 WORD 4
SIGNAL_NAME 3 WORD 9
SIGNAL_NAME 3 WORD 10
SIGNAL_NAME 3 WORD 11
SIGNAL_NAME 3 WORD 12
SIGNAL_NAME 4 WORD 9
SIGNAL_NAME 4 WORD 10
SIGNAL_NAME 4 WORD 11
SIGNAL_NAME 4 WORD 12
SIGNAL_NAME 1 WORD 5
SIGNAL_NAME 1 WORD 6
SIGNAL_NAME 1 WORD 7
SIGNAL_NAME 1 WORD 8
SIGNAL_NAME 2 WORD 5
SIGNAL_NAME 2 WORD 6
SIGNAL_NAME 2 WORD 7
SIGNAL_NAME 2 WORD 8
SIGNAL_NAME 3 WORD 13
SIGNAL_NAME 3 WORD 14
SIGNAL_NAME 3 WORD 15
SIGNAL_NAME 3 WORD 16
SIGNAL_NAME 4 WORD 13
SIGNAL_NAME 4 WORD 14
SIGNAL_NAME 4 WORD 15
SIGNAL_NAME 4 WORD 16

The separations seem to come from me reading each newline. Is there anyway I can just group together the lines from one file (from ST to the end)? And then, split?

my code is as follows.

Code:

foreach $sign (@data1) { #data1 is array containing data
if ($sign =~ /^ST/) { #ST at beginning signifies a new file
$signal = $sign; #this means the line is a SIGNAL line
$signal =~ s/ST//g; #remove ST abbreviation
$signal =~ s/^\s+//; #remove leading whitespace 				
$signal =~ s/\s+$//; 			
$signal =~s/\s//g;
@light = split(';', $signal);
}

foreach $ray (@light) {
if ($sign =~ /^YZ/) {
$krypton = $sign;
$krypton =~ s/YZ//g;
$krypton =~ s/^\s+//;
$krypton =~ s/\s+$//;
$krypton =~ s/\.//g;	
@meteor = split(';', $krypton);			

foreach (@meteor){
$meteor = "$_";
$meteor =~ s/^\s+//;
$news = "$ray $meteor\n";	
}
}
}
}

KevinADC · May 30, 2006

post some real lines from the data file about 30/40 of them should do but make sure it's a good sampling of the various lines in the file.

Off the top of my head I would say you want to use hash of arrays to get the data ordered like you want it, but I'd rather see real data before commiting anything.

sdslrn123 · May 31, 2006

AB DHGL_DROPS STANDARD; PRT; 625 AA.
ST PROT1ACONE; PROT1ACTWO; PROT1ACTHREE; PROT1ACFOUR;
DT 01-NOV-1990, integrated into UniProtKB/Swiss-Prot.
DT 16-MAY-2006, sequence version 3.
DT 30-MAY-2006, entry version 54.
YZ 1WORD1LINE1; 1WORD2LINE1; 1WORD3LINE1; 1WORD4LINE1; 1WORD5LINE1; 1WORD6LINE1
YZ 1WORD7LINE2; 1WORD8LINE2; 1WORD9LINE2; 1WORD10LINE2; 1WORD11LINE2; 1WORD12LINE2.
BL SEQUENCE 625 AA; 68552 MW; 8243BDB41317F522 CRC64;
MATSPSSCDC LVGVPTGPTL ASTCGGSAFM LFMGLLEVFI RSQCDLEDPC GRASTRFRSE
//

AB DHGL_DROPS STANDARD; PRT; 625 AA.
ST PROT2ACONE; PROT2ACTWO; PROT2ACTHREE; PROT2ACFOUR;
DT 16-MAY-2006, sequence version 3.
DT 30-MAY-2006, entry version 54.
DR PROSITE; PS00623; GMC_OXRED_1; 1.
DR PROSITE; PS00624; GMC_OXRED_2; 1.
YZ 2WORD1 LINE1; 2WORD2 LINE1; 2WORD3 LINE1; 2WORD4 LINE1; 2WORD5 LINE1; 2WORD6LINE1
YZ 2WORD7 LINE2; 2WORD8 LINE2; 2WORD9LINE2; 2WORD10 LINE2; 2WORD11 LINE2; 2WORD12 LINE2.
BL 8243BDB41317F522 CRC64;
MATSPSSCDC LVGVPTGPTL ASTCGGSAFM LFMGLLEVFI RSQCDLEDPC GRASTRFRSE
//

Please note that the words can be phrases. But, they are separated from one another by a semi-colon.
Thanks.

sdslrn123 · May 31, 2006

I have come up with a way of removing all the lines to just leave an array containing the lines as elements in an array. Grep Command:
ST...
YZ...
YZ...
ST...
YZ...
YZ...

Is there a way to concatennate elements beginning with ST to the end of the necessary element YZ? I.e can one join elements in an array?

sdslrn123 · May 31, 2006

I am such a Stoopid newbie. I have now used the JOIN function to connect data into string and then separate by a fullstop of last word to give:
ST1 YZFROMLINE1 YZFROMLINE1
ST2 YZFROMLINE1 YZFROMLINE2

Getting there...Hehe.

KevinADC · May 31, 2006

The real data is very different than the dummy data you posted. If you want to explain the criteria to use to convert the real data into the results you want I will try and help, but at this point I don't think I understand enough how you are going from point A (the data file) to point B (the results you want to get).

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Foreach Reading within file

sdslrn123

Technical User

KevinADC

Technical User

sdslrn123

Technical User

sdslrn123

Technical User

sdslrn123

Technical User

KevinADC

Technical User

Similar threads

Part and Inventory Search

Sponsor