The first thing people do is add the o modifier to the regular expression, like this:
while(<>){ print if /^mike$/o; }
The /o modifier tells Perl that you only want it to compile the regex once - and fair enough - but it only matters if your regex is in a string variable; like this:
$var = '^mike$';
while(<>){ print if /$var/o; }
So here you're telling perl that, although the regex is a variable, it only needs to be compiled once - because you're not going to change it in the loop.
This next example would probably do something you didn't mean to because you want the changing value of $i to match different things as the program runs:
while(<>){ $i++; print if /$i/o; }
Because the var changes each time through the loop - but you stop that having any effect by using the /o modifier.
So putting /o at the end of your regexes is a good idea, but only sometimes.
Next question came up in one of the threads here - Is it faster to write:
/OPEN|CLOSE/;
or, separate the match tests like this?
/OPEN/; /CLOSE/;
It seems obvious doesn't it? The first example will run quicker because it's just the one statement, *and* we're giving Perl's regex optimisation something to work with.
Well - I did a little benchmarking, surprising results:
Two scripts:
First one using multiple regex's # reg_test_multi.pl while(<>){ next if /OPEN/; next if /CLOSE/; }
And then using 'or' in the regex # reg_test_or.pl while(<>){ next if /OPEN|CLOSE/; }
I ran these scripts over the same 45,000 line file. I did it several times to get rid of any timing problems resulting from the large file being read several times. The scripts were run on an otherwise quiet machine.
So -- not what you'd call a real benchmark, but close enough for jazz as they say.
# timex ./reg_test_or.pl tmp.txt
real 2.51 user 2.08 sys 0.03
# timex ./reg_test_multi.pl tmp.txt
real 0.47 user 0.38 sys 0.02
Using the | character in the regex made it run much slower, 5 and a bit times slower.
Surprised me, as I say.
The Moral is -- don't use | in regexes if you want it to go quickly.
Or perhaps that should be -- don't assume that more compact code is more efficient.
Or perhaps that should be -- benchmark it, that's the only way you'll know for sure.