Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Regular expression with Perl

Status
Not open for further replies.

Twela

Programmer
Jun 9, 2010
2
FR
Hi,

I have a text which looks something like this:

babillard (adj):taterend;babbelend;kletsend" (noun):babbelkous[m];babbelaar[m];tateraar[m];kletskous[m]"
babillarde

Question 1

I have this regular expression in my script
/^[a-z][a-z]*\s\(.*\)\:[a-z][a-z]*/ (see script below)

which I want to extract a line such as:

babillard (adj):taterend

the problem is if I ran the script through the text the output text stays empty (no output whatsoever)

Question 2
Any idea how I could proceed to extracting form my text a pattern such as this using perl

babillard (adj):taterend
billard (adj):abbelend
billard (adj):kletsend
billard (noun):babbelkous
billard (noun):babbelaar

My script

#!/usr/bin/perl

use strict;
#use LWP::Simple; check this on man
my $ligne;
my $clean;
my $pos;
my $cleaner;
my $txt;

open(FILEINPUT,"$ARGV[0]") || die "erreur de lecture de fichier :$!";
while ($ligne=<FILEINPUT>){
if ($ligne =~ /^[a-z][a-z]*\s\(.*\)\:[a-z][a-z]*/){

print "$ligne\n";
}


}

close(FILEINPUT);
 
1)Your regex matches with a single space between the first word and the first parenthesis, while in your string there seem to be multiple spaces, so simply your string doesn't match.
2)To extract data from a string you should first of all fully describe its structure. Your string seems to have the following one:
Code:
word in lower case
one or more of the following{
  one or more spaces
  type between parentheses in lower case
  colon
  one or more translations in lower case separated by semicolons
  closing quote
}
EOL
Please check whether this is correct and clearly specify what you intend to do on those data.

And please place all of your code and data between [ignore]
Code:
...
[/ignore] tags!

Franco
: Online engineering calculations
: Magnetic brakes for fun rides
: Air bearing pads
 
Hi Prex, thanks for replying.

Your description of my string is correct. I have added * to inidicate there are several spaces between the first word and the opening parenthesis.

The following is how the text I'm working on looks like. It has several lines like the one below:

[text]

babillard (adj):taterend;babbelend;kletsend" (noun):babbelkous[m];babbelaar[m];tateraar[m];kletskous[m]" babillarde

[/text]

My aim is to first extract the string described by the regular expression then prceede to formating like the one below:

babillard (adj):taterendbillard
babillard (adj):abbelendbillard
babillard (adj):kletsendbillard
babillard (noun):babbelkousbillard
babillard (noun):babbelaar

Would you have an idea of how I could do the formating?
 
Your inputs and outputs changed from your first to your second post. For the input in your first post is it just a wierd line wrap?

How about post a few sample input lines and an explanation of how you're coming up with the output. Specifically, the text on the right side of the colon. Where did the letters 'billard' come from? Do you just drop the first two letters from the first word to make it work? Does that work every time?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top