Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

REGEX driving me Insane!!

Status
Not open for further replies.

FinnMan

Technical User
Feb 20, 2001
75
US
Can any of you intellectuals help me with the following pattern match:

I have a text list. On each line of the text list is a string enclosed in parentheses, i.e.,

blahblahblah(blahblahblah)
blahblahblah(blahblahblah)

and so on down the text file. I'm trying to extract the text in parentheses and dump it to another file. I typically don't have a lot of problems greping files for data but I'm not sure how to specify the /// since ( ) are special perl characters. Any and all input would be greatly appreciated.

Regards,
FinnMan
 
try:

while(<>){
[tab]print $1 if /\((.+)\)/;
}

Although I'm sure someone else can come up with something more elegant..... <grin> Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.
 
That's similar to what had (your's was much more concise than mine though). Unfortunately two problems still exist:

1.)Text is wrapping rather than creating a line each time. I think I can fix that one...

2.) Not all parentheses are being removed...perplexing!!

Thx for the quick input!!

JLK
 
JLK,

Are you using the same &quot;read one line of the file at a time&quot; or are you reading it all into an array and then processing it?

Not all parawotsits being removed...... Would you post an example line that does this? Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.
 
Here's what I've got...

assuming I have a text file containing:

blahblahblah(blahblahblah)
blahbl ahblah(blahblahblah)
blahblahblah(blahblahblah)df
blahbl ahblah(blahblahblah)df
blahblahblah(blahblahblah)skfas
blahblahblah (blahblahblah)
bl ahblahblah(blahblahblah)
(You get the idea :) )

my code thus looks as follows:

$a = <>;
chomp $a;
$b = <>;
chomp $b;
open (IN, $a) || die &quot;can't open $a for reading&quot;;
open (OUT, &quot;>$b&quot;) || die &quot;can't create $b&quot;;
while (<IN>) {
print $1 if /\((.+)\)\/;

}
close(IN) || die &quot;can't close $a&quot;;
close(OUT) || die &quot;can't close $b&quot;;



Right now it's just dumping to screen and not the output file (i can fix that..) but the screen output just wraps..and doesn't seem to remove all the parentheses....

JLK
 
Hi there,

This should do the trick. Take *everything* between the two lines of dashes, put it into a file and run it. Explanation below.
-----------
while (<DATA>) {
print &quot;$1\n&quot; if /\((.+)\)/;
}
__END__
blahblahblah(blehblehbleh)
blahbl ahblah(blihblihblih)
blahblahblah(blohblohbloh)blah
blahbl ahblah(bluhbluhbluh)blah
blahblahblah(blehblehbleh)blah
blahblahblah (blihblihblih)
bl ahblahblah(blohblohbloh)
------------

First -- what my bit of code does.

The <DATA> construct is just a trick really. It means &quot;go look for a line like '__END__' and read everything below that as if it's another file&quot;. So it's a way of having code and data in the same file I guess.

to make it work on another file and write output to a final file follow these steps.

1. Put this into a file called paren.pl
while (<>) {
print &quot;$1\n&quot; if /\((.+)\)/;
}

2. Put this into a file called paren.dat
blahblahblah(blehblehbleh)
blahbl ahblah(blihblihblih)
blahblahblah(blohblohbloh)blah
blahbl ahblah(bluhbluhbluh)blah
blahblahblah(blehblehbleh)blah
blahblahblah (blihblihblih)
bl ahblahblah(blohblohbloh)

3. Type the following command on your command line (dos or unix, doesn't matter):
perl paren.pl < paren.dat > paren.out

I think you were off the track a bit with the

$a = <>;
chomp $a;
$b = <>;
chomp $b;

stuff.

That code does the following:
$a = <>; # read everything from stdin into $a (the whole file)
chomp $a; # remove the last char from $a if it's a newline char

and then the same for $b.

Try my code out and then get back to me if you need some help specifying the input and output files in the script itself. I'll post a reply tomorrow (going to bed now, I'm in the UK and it's a bit late)

Regards, Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.
 
your regex has an extra backslash in it at the end. this should have caused more of a problem than it did, but anyway, the code you'll need if you want one piece of output per line, sent to the file OUT is(with added regex stuff to make sure extra backslash doesn't happen again):[tt]
while (<IN>)
{
if ( $_ =~ m~\((.+)\)~ )
{
print OUT &quot;$1\n&quot;;
}
}[/tt]

if it still doesn't cut out the parathings, post some of the exact text not being copied right. or, to flub a fix, add a [tt]tr/\(\)//;[/tt] inside the if block, but flubing is bad.
hth &quot;If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito.&quot;
 
Thx Guys!!

All works well now. I appreciate the assistance. It's been a great help as well as an encouragement.

JLK
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top