×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Weird regex behavior....

Weird regex behavior....

Weird regex behavior....

(OP)
This isn't a problem, more a question about an accidental solution.

Here's the meat of it:

m/^*$/ will match any string.  Why is this?  i played around, and, in fact, m/^*/ will also match any string, but m/^$/ will only match a null string, and m/*$/ and m/*/ both give an error (as they should).

any help is appreciated.

"If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito."

RE: Weird regex behavior....

Well the "^" is the beginning anchor and "$" is the ending anchor, so m/^$/ will match only a null string - nothing between the beginning and the ending markers.  And a "*" will match 0 (zero) or more occurrences of any character. So m/^*$/ is indeed any string, even the null string.
HTH

RE: Weird regex behavior....

Why does "*" match 0 or more occurrences of any character? Shouldn't ".*" do that?

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard.

RE: Weird regex behavior....

(OP)
Yes, my knowledge says the same thing as tsdragon.  '*' only affects how things match, it doesn't match things on it's own.  so basically, the '*' is affecting '^', so it's matching 0 or more '^' characters.  but '^' is a zero-width assertion, not a character.  so there's no way that matching even a million '^'s would advance the index to the second character, much less the end of the string.  not to mention that the regex in question has nothing to match the parts inbetween the beginning and the end, which are the only two things that will be matched.
 and as i think about it, since '^' is zero width, it could match over and over at the same spot, as many times as you wanted it to, and it would still match.  i'm beginning to wonder if zero-width assersions were even meant to be postfixed by '*'s and '+'s.
 i'm tempted to crack open the perl code dealing with regular expressions to see if this is a bug in the code, or maybe a default behavior, like maybe if something matches a certain number of times (a bizzilion), the engine just assumes that it matches the pattern.  however, i doubt i have a sufficient understanding of the inner workings of perl at this point to do this...

"If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito."

RE: Weird regex behavior....

I have some vague memory of reading about the up caret changing its behavior in certain situations.  I think I remember that it can be used to negate a match pattern....

found it.

In 'Programming Perl' by Wall, Christiansen, and Schwartz, 2nd edition, page 64,
"A caret at the front of the list causes it to match only characters that are not in the list."

?Maybe the regex engine is interpreting "^*$" to be a caret in front of an empty list.  Thus, it negates nothing and matches everything???????


 
 
 keep the rudder amid ship and beware the odd typo

RE: Weird regex behavior....

(OP)
i like the idea, but i tried to verify this, and it doesn't seem to be the case.  here's the logic i used:  the '^' is being interpretted as the first entry in a character class (square brackets), and would therefore be equivelent to one of the following(?):
/[^]*$/
/[^*]$/
/[^*$]/
the first and the last both produce syntax errors, and the middle will match almost any string, unless the last character is a '*'.  if you escape the '$' in the last case, it matches anything except '', '*', and '$'.
 well, those are just test cases.  the real turning point to me was i decided to take a look at $`, $&, and $'.  $& and $' are both completely empty, while $` contains the entire string.  thus, it seems that the regex is matching at the end of string, but is matching a zero-width assertion (meaning a '$'), and not anything in the string itself.  so, we know now what it matches (anything with an end, i guess), but not why.
 actually, though, this regex will also match undefined values.  i'm not sure if this means anything.

thanks, goboating, but it's operation is still unknown.  i'm gunna ask all the perl gurus/mongers/monks/mages i know, and maybe post it on some mailing list or another.  i'll report back if i learn anything.

and, as a matter of note, this strange behavior allows for the matching of anything, and so a regex like:
/^$word$/
will always match when $word = '*', thus treating '*' as a wildcard.

stillflame

"If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito."

RE: Weird regex behavior....

(OP)
(2 minutes later...)
ooo, more on stuff.  when the regex is sortened to /^*/ the $` variable is empty this time, while the $' variable is the entire string, meaning the regex matches at the beginning of the string instead of the end.  now i'm thinking maybe '^*' matches any zero-width assertion....
yea. this code:
$word = "a1b2c3";
if ($word =~ /1^*b/)
{
print "Match!\n\$'=$'\n\$&=$&\n\$`=$`\n";
}

will show that it matches the "1b"...

i think this case is solved, although it aplication to the real world is almost completely and absolutely absent.  

"If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito."

RE: Weird regex behavior....

"although its application to the real world is almost completely and absolutely absent"

and that's what I like about it <grin>

Mike
michael.j.lacey@ntlworld.com
Email welcome if you're in a hurry or something -- but post in tek-tips as well please, and I will post my reply here as well.

RE: Weird regex behavior....

(OP)
you know how in scooby doo, after they catch the villian they think they've solved the case, but then Thelma would demask them and tell the real story...

well, i just unmasked this, and it's simpler than i had expected.  '*' matches 0 or more, right.  well, it's just matching zero '^', followed by a '$'.  i should have expected it to be simple.  well, now this case is really solved.  :)

"If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito."

RE: Weird regex behavior....

Well I was sorta right and sorta wrong.  Sorry 'bout that.

A little more food for the fire "." can match anything except (usually) a newline or null (Mastering Regular Expressions).

So I guess I'm real crunchy and tasty with mustard.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login


Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close