Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Data extraction

Status
Not open for further replies.

biobrain

MIS
Joined
Jun 21, 2007
Messages
90
Location
GB
Dear All,

I have a lines written in my file like this

REMARK 2 RESOLUTION. 3.10 ANGSTROMS.
REMARK 2 METHOD X-RAYS
REMARK 3 SOMETHING ELSE

Now I want to have a matching statement which will match a line containing REMARK 2 RESOLUTION. and will extract 3.10 from this.

3.10 is a changeable value it could be 2.80, 2.10, 2.25, 2.40 in REMARKS 2 so I want to extract this value .

I tried
Code:
if($_=~ /RESOLUTION[.] [\d]*[\s\S]*ANGSTROMS/g){
print $_;
}

but it is printing the whole line, I am only interested in the value 3.10

Regards.
 
There will be a million and one ways to do this. Here's my offering:

Code:
if ($_ =~ m/(\d\.\d{2})\s\S+\.$/) {
 
try:

if(my $biobrain=$_ =~ m/(\d\.\d{2})\s\S+\.$/){
print $biobrain;
}
 
keguazi - are you my twin, separated at birth?

biobrain - I should have added that you will want to print $1 rather than $_ with my code.
 
I have a feeling that he has a lot more data and will want something specfically containing remark and resolution (just a gut feeling :) )

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those Who Say It Cannot Be Done Are Usually Interrupted by Someone Else Doing It; Give the wrong symptoms, get the wrong solutions;
 
Yes travs69

You are right. Let me try the above examples. I will tell my feedback about that
 
The solution given above

Code:
if ($_ =~ m/(\d\.\d{2})\s\S+\.$/) {
is not working for me


Here is some part of my file

REMARK 1
REMARK 1 REFERENCE 1
REMARK 1 AUTH U.SCHULZE-GAHMEN,S.H.KIM
REMARK 1 TITL CRYSTALLIZATION OF A COMPLEX BETWEEN HUMAN CDK6
REMARK 1 TITL 2 AND A VIRUS-ENCODED CYCLIN IS CRITICALLY DEPENDENT
REMARK 1 TITL 3 ON THE ADDITION OF SMALL CHARGED ORGANIC MOLECULES
REMARK 1 REF TO BE PUBLISHED
REMARK 1 REFN
REMARK 2
REMARK 2 RESOLUTION. 3.10 ANGSTROMS.
REMARK 3
REMARK 3 REFINEMENT.
REMARK 3 PROGRAM : CNS 1.0
REMARK 3 AUTHORS : BRUNGER,ADAMS,CLORE,DELANO,GROS,GROSSE-
REMARK 3 : KUNSTLEVE,JIANG,KUSZEWSKI,NILGES, PANNU,
REMARK 3 : READ,RICE,SIMONSON,WARREN
REMARK 3
REMARK 3 REFINEMENT TARGET : ENGH & HUBER

I am interested in to Extract 3.10 Value with Remark 2 resolution.

My code is

Code:
if($_=~ /RESOLUTION[.] [\d]*[\s\S]*ANGSTROMS/g){
print $_;
}

and it is giving me print out put as a whole line

REMARK 2 RESOLUTION. 3.10 ANGSTROMS.

I want that output should print only value 3.10

This is not a fixed value, It could be different in my different files BUT REMARK 2 RESOLUTION. X.XX ANGSTROMS. is always a fixed thing

X.XX can be any digits.
 
Code:
if (/^REMARK[\s\S]+RESOLUTION\.\s+(\S+)\s+ANGSTROMS/) {
   print $1;
}

------------------------------------------
- Kevin, perl coder unexceptional! [wiggle]
 
Thanks,

Perfect
Code:
if (/^REMARK[\s\S]+RESOLUTION\.\s+(\S+)\s+ANGSTROMS/) {
   print $1;
}

I have also tried it with some changes of the () position all are working
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top