Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

regex and logical operator odd behaviour in extracting text

Status
Not open for further replies.

zumbabumba

Technical User
Mar 4, 2008
8
IT
Hi all. I want to extract words from a text file using regular expression, working line by line, and giving them a line number. Here is my code:

Code:
#!/usr/local/bin/perl
$line = "line1. to be or not to be, 	that is the question"; 
@wordsinline = split(/\s+/, $line);
foreach $word (@wordsinline){
  $linenum = $word if ($word =~ m/^\D+\d+\./);
  $linenum = "NOLINE!" if ($word !~ m/^\D+\d+\./);
};
foreach $word (@wordsinline){
  print "LEMMA: $word, LINENUM: $linenum \n";
};

What the script is supposed to do is:
- split a line in to words
- if a word match something similarto "line1.", then that becomes the linenumber
- otherwise there is an error, namely "NOLINE!"
- print out the words with the relative linenumber

What I get is:
LEMMA: line1., LINENUM: NOLINE!
LEMMA: to, LINENUM: NOLINE!
LEMMA: be, LINENUM: NOLINE!
LEMMA: or, LINENUM: NOLINE!
LEMMA: not, LINENUM: NOLINE!
LEMMA: to, LINENUM: NOLINE!
LEMMA: be,, LINENUM: NOLINE!
LEMMA: that, LINENUM: NOLINE!
LEMMA: is, LINENUM: NOLINE!
LEMMA: the, LINENUM: NOLINE!
LEMMA: question, LINENUM: NOLINE!

What I want is:
LEMMA: line1., LINENUM: line1.
LEMMA: to, LINENUM: line1.
LEMMA: be, LINENUM: line1.
LEMMA: or, LINENUM: line1.
LEMMA: not, LINENUM: line1.
LEMMA: to, LINENUM: line1.
LEMMA: be,, LINENUM: line1.
LEMMA: that, LINENUM: line1.
LEMMA: is, LINENUM: line1.
LEMMA: the, LINENUM: line1.
LEMMA: question, LINENUM: line1.

If I remove $linenum = "NOLINE!" if ($word !~ m/^\D+\d+\./) then I got the right output, but I do need to handle errors.
Any help?
 
zumbabumba,

What you want the script to do is different from the output you mentioned.

i.e.
otherwise there is an error, namely "NOLINE!"

In that case the output should be something like below?

LEMMA: line1., LINENUM: line1.
LEMMA: to, NOLINE!.
LEMMA: be, NOLINE!
LEMMA: or, NOLINE!
...
...


--------------------------------------------------------------------------
I never set a goal because u never know whats going to happen tommorow.
 
Thanks for your answer Spookie. Well I just want that the script recognizes the presence of line notation, and when that is not present it should output an error. I don't need something like:
LEMMA: line1., LINENUM: line1.
LEMMA: to, NOLINE!.
LEMMA: be, NOLINE!
LEMMA: or, NOLINE!

I want
LEMMA: line1., LINENUM: line1.
LEMMA: to, LINENUM: line1.
LEMMA: be, LINENUM: line1.
LEMMA: or, LINENUM: line1.

Perhaps I should use some other command, but by now my knowledge of perl is unfortunately rather limited. Any suggestion is very welcome!
 
Thanks Franco! Beautyful and elegant script man :). I don't understand it completely, but I will try to find some explanation over the internet.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top