zumbabumba
Technical User
Hi all. I want to extract words from a text file using regular expression, working line by line, and giving them a line number. Here is my code:
What the script is supposed to do is:
- split a line in to words
- if a word match something similarto "line1.", then that becomes the linenumber
- otherwise there is an error, namely "NOLINE!"
- print out the words with the relative linenumber
What I get is:
LEMMA: line1., LINENUM: NOLINE!
LEMMA: to, LINENUM: NOLINE!
LEMMA: be, LINENUM: NOLINE!
LEMMA: or, LINENUM: NOLINE!
LEMMA: not, LINENUM: NOLINE!
LEMMA: to, LINENUM: NOLINE!
LEMMA: be,, LINENUM: NOLINE!
LEMMA: that, LINENUM: NOLINE!
LEMMA: is, LINENUM: NOLINE!
LEMMA: the, LINENUM: NOLINE!
LEMMA: question, LINENUM: NOLINE!
What I want is:
LEMMA: line1., LINENUM: line1.
LEMMA: to, LINENUM: line1.
LEMMA: be, LINENUM: line1.
LEMMA: or, LINENUM: line1.
LEMMA: not, LINENUM: line1.
LEMMA: to, LINENUM: line1.
LEMMA: be,, LINENUM: line1.
LEMMA: that, LINENUM: line1.
LEMMA: is, LINENUM: line1.
LEMMA: the, LINENUM: line1.
LEMMA: question, LINENUM: line1.
If I remove $linenum = "NOLINE!" if ($word !~ m/^\D+\d+\./) then I got the right output, but I do need to handle errors.
Any help?
Code:
#!/usr/local/bin/perl
$line = "line1. to be or not to be, that is the question";
@wordsinline = split(/\s+/, $line);
foreach $word (@wordsinline){
$linenum = $word if ($word =~ m/^\D+\d+\./);
$linenum = "NOLINE!" if ($word !~ m/^\D+\d+\./);
};
foreach $word (@wordsinline){
print "LEMMA: $word, LINENUM: $linenum \n";
};
What the script is supposed to do is:
- split a line in to words
- if a word match something similarto "line1.", then that becomes the linenumber
- otherwise there is an error, namely "NOLINE!"
- print out the words with the relative linenumber
What I get is:
LEMMA: line1., LINENUM: NOLINE!
LEMMA: to, LINENUM: NOLINE!
LEMMA: be, LINENUM: NOLINE!
LEMMA: or, LINENUM: NOLINE!
LEMMA: not, LINENUM: NOLINE!
LEMMA: to, LINENUM: NOLINE!
LEMMA: be,, LINENUM: NOLINE!
LEMMA: that, LINENUM: NOLINE!
LEMMA: is, LINENUM: NOLINE!
LEMMA: the, LINENUM: NOLINE!
LEMMA: question, LINENUM: NOLINE!
What I want is:
LEMMA: line1., LINENUM: line1.
LEMMA: to, LINENUM: line1.
LEMMA: be, LINENUM: line1.
LEMMA: or, LINENUM: line1.
LEMMA: not, LINENUM: line1.
LEMMA: to, LINENUM: line1.
LEMMA: be,, LINENUM: line1.
LEMMA: that, LINENUM: line1.
LEMMA: is, LINENUM: line1.
LEMMA: the, LINENUM: line1.
LEMMA: question, LINENUM: line1.
If I remove $linenum = "NOLINE!" if ($word !~ m/^\D+\d+\./) then I got the right output, but I do need to handle errors.
Any help?