Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do I make my regular expressions faster

Optimization

How do I make my regular expressions faster

by  MikeLacey  Posted    (Edited  )
The first thing people do is add the o modifier to the regular expression, like this:

while(<>){
print if /^mike$/o;
}

The /o modifier tells Perl that you only want it to compile the regex once - and fair enough - but it only matters if your regex is in a string variable; like this:

$var = '^mike$';

while(<>){
print if /$var/o;
}

So here you're telling perl that, although the regex is a variable, it only needs to be compiled once - because you're not going to change it in the loop.

This next example would probably do something you didn't mean to because you want the changing value of $i to match different things as the program runs:

while(<>){
$i++;
print if /$i/o;
}

Because the var changes each time through the loop - but you stop that having any effect by using the /o modifier.

So putting /o at the end of your regexes is a good idea, but only sometimes.


Next question came up in one of the threads here - Is it faster to write:

/OPEN|CLOSE/;

or, separate the match tests like this?

/OPEN/;
/CLOSE/;

It seems obvious doesn't it? The first example will run quicker because it's just the one statement, *and* we're giving Perl's regex optimisation something to work with.

Well - I did a little benchmarking, surprising results:

Two scripts:

First one using multiple regex's
# reg_test_multi.pl
while(<>){
next if /OPEN/;
next if /CLOSE/;
}

And then using 'or' in the regex
# reg_test_or.pl
while(<>){
next if /OPEN|CLOSE/;
}

I ran these scripts over the same 45,000 line file. I did it several times to get rid of any timing problems resulting from the large file being read several times. The scripts were run on an otherwise quiet machine.

So -- not what you'd call a real benchmark, but close enough for jazz as they say.

# timex ./reg_test_or.pl tmp.txt

real 2.51
user 2.08
sys 0.03

# timex ./reg_test_multi.pl tmp.txt

real 0.47
user 0.38
sys 0.02

Using the | character in the regex made it run much slower, 5 and a bit times slower.

Surprised me, as I say.

The Moral is -- don't use | in regexes if you want it to go quickly.

Or perhaps that should be -- don't assume that more compact code is more efficient.

Or perhaps that should be -- benchmark it, that's the only way you'll know for sure.
Register to rate this FAQ  : BAD 1 2 3 4 5 6 7 8 9 10 GOOD
Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

Back
Top