Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Shaun E on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Foreach Reading within file

Status
Not open for further replies.

sdslrn123

Technical User
Jan 17, 2006
78
GB
Okay this is complex. But, if you have the patience just to nudge me in the right direction. I know I am close...

I have a file. Format of file is:

Line starts with ST: SIGNAL_NAME1, SIGNAL_NAME2
.
.
.
Line starts with YZ: WO RD 1, WO RD 2, WO RD 3, WO RD 4,
Line starts with YZ: WO RD 5, WO RD 6, WO RD 7, WO RD 8.
.
.
.
[[although lines below are beginning of new data of same experiment they are joined with the above one]]
Line starts with ST: SIGNAL_NAME3, SIGNAL_NAME4
.
.
.
Line starts with YZ:new WORD 9,new WORD 10,new WORD 11,new WORD 12,
Line starts with YZ:new WORD 13,new WORD 14,new WORD 15,new WORD 16.
.
.
.
and so on...

I need to change this file to the following format:
SIGNAL_NAME 1 WORD 1
SIGNAL_NAME 1 WORD 2
SIGNAL_NAME 1 WORD 3
SIGNAL_NAME 1 WORD 4
SIGNAL_NAME 1 WORD 5
SIGNAL_NAME 1 WORD 6
SIGNAL_NAME 1 WORD 7
SIGNAL_NAME 1 WORD 8
SIGNAL_NAME 2 WORD 1
SIGNAL_NAME 2 WORD 2
SIGNAL_NAME 2 WORD 3
SIGNAL_NAME 2 WORD 4
SIGNAL_NAME 2 WORD 5
SIGNAL_NAME 2 WORD 6
SIGNAL_NAME 2 WORD 7
SIGNAL_NAME 2 WORD 8
SIGNAL_NAME 3 WORD 9
SIGNAL_NAME 3 WORD 10
SIGNAL_NAME 3 WORD 11
SIGNAL_NAME 3 WORD 12
SIGNAL_NAME 3 WORD 13
SIGNAL_NAME 3 WORD 14
SIGNAL_NAME 3 WORD 15
SIGNAL_NAME 3 WORD 16
SIGNAL_NAME 4 WORD 9
SIGNAL_NAME 4 WORD 10
SIGNAL_NAME 4 WORD 11
SIGNAL_NAME 4 WORD 12
SIGNAL_NAME 4 WORD 13
SIGNAL_NAME 4 WORD 14
SIGNAL_NAME 4 WORD 15
SIGNAL_NAME 4 WORD 16


But, I am getting...

SIGNAL_NAME 1 WORD 1
SIGNAL_NAME 1 WORD 2
SIGNAL_NAME 1 WORD 3
SIGNAL_NAME 1 WORD 4
SIGNAL_NAME 2 WORD 1
SIGNAL_NAME 2 WORD 2
SIGNAL_NAME 2 WORD 3
SIGNAL_NAME 2 WORD 4
SIGNAL_NAME 3 WORD 9
SIGNAL_NAME 3 WORD 10
SIGNAL_NAME 3 WORD 11
SIGNAL_NAME 3 WORD 12
SIGNAL_NAME 4 WORD 9
SIGNAL_NAME 4 WORD 10
SIGNAL_NAME 4 WORD 11
SIGNAL_NAME 4 WORD 12
SIGNAL_NAME 1 WORD 5
SIGNAL_NAME 1 WORD 6
SIGNAL_NAME 1 WORD 7
SIGNAL_NAME 1 WORD 8
SIGNAL_NAME 2 WORD 5
SIGNAL_NAME 2 WORD 6
SIGNAL_NAME 2 WORD 7
SIGNAL_NAME 2 WORD 8
SIGNAL_NAME 3 WORD 13
SIGNAL_NAME 3 WORD 14
SIGNAL_NAME 3 WORD 15
SIGNAL_NAME 3 WORD 16
SIGNAL_NAME 4 WORD 13
SIGNAL_NAME 4 WORD 14
SIGNAL_NAME 4 WORD 15
SIGNAL_NAME 4 WORD 16


The separations seem to come from me reading each newline. Is there anyway I can just group together the lines from one file (from ST to the end)? And then, split?

my code is as follows.

Code:
foreach $sign (@data1) { #data1 is array containing data
if ($sign =~ /^ST/) { #ST at beginning signifies a new file
$signal = $sign; #this means the line is a SIGNAL line
$signal =~ s/ST//g; #remove ST abbreviation
$signal =~ s/^\s+//; #remove leading whitespace 				
$signal =~ s/\s+$//; 			
$signal =~s/\s//g;
@light = split(';', $signal);
}

foreach $ray (@light) {
if ($sign =~ /^YZ/) {
$krypton = $sign;
$krypton =~ s/YZ//g;
$krypton =~ s/^\s+//;
$krypton =~ s/\s+$//;
$krypton =~ s/\.//g;	
@meteor = split(';', $krypton);			

foreach (@meteor){
$meteor = "$_";
$meteor =~ s/^\s+//;
$news = "$ray $meteor\n";	
}
}
}
}
 
post some real lines from the data file about 30/40 of them should do but make sure it's a good sampling of the various lines in the file.

Off the top of my head I would say you want to use hash of arrays to get the data ordered like you want it, but I'd rather see real data before commiting anything.
 
AB DHGL_DROPS STANDARD; PRT; 625 AA.
ST PROT1ACONE; PROT1ACTWO; PROT1ACTHREE; PROT1ACFOUR;
DT 01-NOV-1990, integrated into UniProtKB/Swiss-Prot.
DT 16-MAY-2006, sequence version 3.
DT 30-MAY-2006, entry version 54.
YZ 1WORD1LINE1; 1WORD2LINE1; 1WORD3LINE1; 1WORD4LINE1; 1WORD5LINE1; 1WORD6LINE1
YZ 1WORD7LINE2; 1WORD8LINE2; 1WORD9LINE2; 1WORD10LINE2; 1WORD11LINE2; 1WORD12LINE2.
BL SEQUENCE 625 AA; 68552 MW; 8243BDB41317F522 CRC64;
MATSPSSCDC LVGVPTGPTL ASTCGGSAFM LFMGLLEVFI RSQCDLEDPC GRASTRFRSE
//

AB DHGL_DROPS STANDARD; PRT; 625 AA.
ST PROT2ACONE; PROT2ACTWO; PROT2ACTHREE; PROT2ACFOUR;
DT 16-MAY-2006, sequence version 3.
DT 30-MAY-2006, entry version 54.
DR PROSITE; PS00623; GMC_OXRED_1; 1.
DR PROSITE; PS00624; GMC_OXRED_2; 1.
YZ 2WORD1 LINE1; 2WORD2 LINE1; 2WORD3 LINE1; 2WORD4 LINE1; 2WORD5 LINE1; 2WORD6LINE1
YZ 2WORD7 LINE2; 2WORD8 LINE2; 2WORD9LINE2; 2WORD10 LINE2; 2WORD11 LINE2; 2WORD12 LINE2.
BL 8243BDB41317F522 CRC64;
MATSPSSCDC LVGVPTGPTL ASTCGGSAFM LFMGLLEVFI RSQCDLEDPC GRASTRFRSE
//

Please note that the words can be phrases. But, they are separated from one another by a semi-colon.
Thanks.
 
I have come up with a way of removing all the lines to just leave an array containing the lines as elements in an array. Grep Command:
ST...
YZ...
YZ...
ST...
YZ...
YZ...

Is there a way to concatennate elements beginning with ST to the end of the necessary element YZ? I.e can one join elements in an array?
 
I am such a Stoopid newbie. I have now used the JOIN function to connect data into string and then separate by a fullstop of last word to give:
ST1 YZFROMLINE1 YZFROMLINE1
ST2 YZFROMLINE1 YZFROMLINE2

Getting there...Hehe.
 
The real data is very different than the dummy data you posted. If you want to explain the criteria to use to convert the real data into the results you want I will try and help, but at this point I don't think I understand enough how you are going from point A (the data file) to point B (the results you want to get).
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top