Navigation

More options

Style variation

Close Menu

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do I make my regular expressions faster

Optimization

How do I make my regular expressions faster

by MikeLacey Posted Jan 21, 2003 (Edited Jul 14, 2005)

The first thing people do is add the o modifier to the regular expression, like this:

while(<>){
print if /^mike$/o;
}

The /o modifier tells Perl that you only want it to compile the regex once - and fair enough - but it only matters if your regex is in a string variable; like this:

$var = '^mike$';

while(<>){
print if /$var/o;
}

So here you're telling perl that, although the regex is a variable, it only needs to be compiled once - because you're not going to change it in the loop.

This next example would probably do something you didn't mean to because you want the changing value of $i to match different things as the program runs:

while(<>){
$i++;
print if /$i/o;
}

Because the var changes each time through the loop - but you stop that having any effect by using the /o modifier.

So putting /o at the end of your regexes is a good idea, but only sometimes.

Next question came up in one of the threads here - Is it faster to write:

/OPEN|CLOSE/;

or, separate the match tests like this?

/OPEN/;
/CLOSE/;

It seems obvious doesn't it? The first example will run quicker because it's just the one statement, *and* we're giving Perl's regex optimisation something to work with.

Well - I did a little benchmarking, surprising results:

Two scripts:

First one using multiple regex's
# reg_test_multi.pl
while(<>){
next if /OPEN/;
next if /CLOSE/;
}

And then using 'or' in the regex
# reg_test_or.pl
while(<>){
next if /OPEN|CLOSE/;
}

I ran these scripts over the same 45,000 line file. I did it several times to get rid of any timing problems resulting from the large file being read several times. The scripts were run on an otherwise quiet machine.

So -- not what you'd call a real benchmark, but close enough for jazz as they say.

# timex ./reg_test_or.pl tmp.txt

real 2.51
user 2.08
sys 0.03

# timex ./reg_test_multi.pl tmp.txt

real 0.47
user 0.38
sys 0.02

Using the | character in the regex made it run much slower, 5 and a bit times slower.

Surprised me, as I say.

The Moral is -- don't use | in regexes if you want it to go quickly.

Or perhaps that should be -- don't assume that more compact code is more efficient.

Or perhaps that should be -- benchmark it, that's the only way you'll know for sure.

Please Note: 1 is Bad, 10 is Good :-)

Part and Inventory Search

This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.

Accept Learn more…

Back

Top