INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

Optimization

How do I make my regular expressions faster by MikeLacey
Posted: 21 Jan 03 (Edited 14 Jul 05)

The first thing people do is add the o modifier to the regular expression, like this:

while(<>){
  print if /^mike$/o;
}

The /o modifier tells Perl that you only want it to compile the regex once - and fair enough - but it only matters if your regex is in a string variable; like this:

$var = '^mike$';

while(<>){
    print if /$var/o;
}

So here you're telling perl that, although the regex is a variable, it only needs to be compiled once - because you're not going to change it in the loop.

This next example would probably do something you didn't mean to because you want the changing value of $i to match different things as the program runs:

while(<>){
    $i++;
    print if /$i/o;
}

Because the var changes each time through the loop - but you stop that having any effect by using the /o modifier.

So putting /o at the end of your regexes is a good idea, but only sometimes.


Next question came up in one of the threads here - Is it faster to write:

/OPEN|CLOSE/;

or, separate the match tests like this?

/OPEN/;
/CLOSE/;

It seems obvious doesn't it? The first example will run quicker because it's just the one statement, *and* we're giving Perl's regex optimisation something to work with.

Well - I did a little benchmarking, surprising results:

Two scripts:

First one using multiple regex's
# reg_test_multi.pl
while(<>){
  next if /OPEN/;
  next if /CLOSE/;
}

And then using 'or' in the regex
# reg_test_or.pl
while(<>){
  next if /OPEN|CLOSE/;
}

I ran these scripts over the same 45,000 line file. I did it several times to get rid of any timing problems resulting from the large file being read several times. The scripts were run on an otherwise quiet machine.

So -- not what you'd call a real benchmark, but close enough for jazz as they say.

# timex ./reg_test_or.pl tmp.txt

real        2.51
user        2.08
sys         0.03

# timex ./reg_test_multi.pl tmp.txt

real        0.47
user        0.38
sys         0.02

Using the | character in the regex made it run much slower, 5 and a bit times slower.

Surprised me, as I say.

The Moral is -- don't use | in regexes if you want it to go quickly.

Or perhaps that should be -- don't assume that more compact code is more efficient.

Or perhaps that should be -- benchmark it, that's the only way you'll know for sure.

Back to Perl FAQ Index
Back to Perl Forum

My Archive

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close