regex question ...

rtb1 · Oct 2, 2000

Hi guys,

I’m looking for a regex solution for this:

I would like to check if the contents of a $ matches part of the contents of another $. If so proceed with the next action.

Example:

$one = “regular expressions”;
$two = “I would like to know more about regular expressions”;

match is true

Is the next sort of matching possible (contents of $one matches part of $two with characters between the matching parts in $two)?

$one = “reg ex”;
$two = “I would like to know more about regular expressions”;

match is true

Anyhelp would be great!

Raoul
[sig][/sig]

MikeLacey · Oct 2, 2000

Hi Raoul,

First one:

Code:

$one = 'regular expressions'; $two = 'I would like to know more about regular expressions'; if($two =~ /$one/){ print "match found\n"; }

That will print 'match found', it searches for $one in $two.

Second one:

Hmmm... goBoating? [sig]Mike <a href=mailto:michael.j.lacey@ntlworld.com>michael.j.lacey@ntlworld.com</a> <a href=

http://www.cargill.com/>

Cargill's Corporate Web Site</a> Making mistakes, so you don't have to. <grin>[/sig]

goBoating · Oct 2, 2000

Why sure.......<grin> by the way Mike, what is Queen's English? <chuckle>

$one = “reg ex”;
$two = “I would like to know more about regular expressions”;

A period is a regex wild card. So, we can ask for the first part, 'reg' and then some wild card stuff, and then the second part, 'ex'.

So for the specific example....

$two =~ /reg.*?ex/;

the * says multiples of the previous char (which was a wild card) and the ? says match minimally.

A little tricker......if you did not have your pattern in two pieces and you wanted to be able to explode the parts of your pattern automatically.......

$two = “I would like to know more about regular expressions”;
$one = 'like know about reg ex';
@pieces = split(/ /,$one);
$pattern_to_match = join '.*?',@pieces;
# If I thought that through correctly, it should build a string that looks like -
# 'like.*?know.*?about.*?reg.*?ex' - which would look for the words with
# wild cards in between.
# So,
$two =~ /$pattern_to_match/;

# I would [red]like[/red] to [red]know[/red] more [red]about[/red] [red]reg[/red]ular [red]ex[/red]pressions

hope this helps....

[sig] <a href=mailto: > </a> <a href= > </a> keep the rudder amid ship and beware the odd typo[/sig]

rtb1 · Oct 2, 2000

Great, thanks!

So short and so powerfull.

I have been experimenting with this and it works great. Probably there is no solution for this but maybe there is. In my country a lot of words have special signs, like: í, ó, ñ etc. However people are not consistent at all in writing those words correctly. As I already assumed "i" doesn't match "í".

Is there a solution for this one so a search for 'mañana' will match 'manana'?

Anyway, thanks for the help both,

Raoul [sig][/sig]

MikeLacey · Oct 3, 2000

/reg.*?ex/

Hmmm --- sort of obvious really, I wasn't paying attention was I...

And the Queens English, btw, is wot I speek... Just for your information... [sig]Mike <a href=mailto:michael.j.lacey@ntlworld.com>michael.j.lacey@ntlworld.com</a> <a href=

http://www.cargill.com/>

Cargill's Corporate Web Site</a> Making mistakes, so you don't have to. <grin>[/sig]

tanderso · Oct 3, 2000

You'd have to create a dictionary of like symbols if you wanted to do that. Not a difficult proposition with Spanish. [sig] Sincerely, <a href=mailto: > </a> <a href=

http://www.oac-design.com>Tom

Anderson</a> CEO, Order amid Chaos, Inc.

http://www.oac-design.com

[/sig]

rtb1 · Oct 4, 2000

Hi Tom,

I didn't try this yet but I was thinking to do something in the line of replacing special characters like:

$one=~ s/í/\i/g;

(probably this won't work like this but to show the idea)

The above command would be needed for each special character. I was just thinking if it wouldn't slow down the script to much when there is a lot to check.

Is this what you mean with creating a dictionary or (if not) would that be faster?

Regards,

Raoul [sig][/sig]

tanderso · Oct 4, 2000

That would be a reasonable way to do it. Regexps are a bit slow, so it may effect performance. Try getting all of your replacements in 1 RE. [sig] Sincerely, <a href=mailto: > </a> <a href=

http://www.oac-design.com>Tom

Anderson</a> CEO, Order amid Chaos, Inc.

http://www.oac-design.com

[/sig]

rtb1 · Oct 5, 2000

I started to experiment with this and apparantly it does work in some cases.

$one=~ s/í/\i/g;

and

$one=~ s/ó/\o/g;

both work, but:

$one=~ s/ñ/\n/g; $one=~ s/é/\e/g; $one=~ s/á/\a/g;

either replaces the character with a " " or with a square character? The character itself is identified but it is not replaced by the indicated character.

How is this possible or more important how do I solve this situation?

Raoul [sig][/sig]

tanderso · Oct 5, 2000

\n is a newline. Why are you escaping the replacement character?

$one=~ s/ñ/n/g; $one=~ s/é/e/g; $one=~ s/á/a/g;

I'm not exactly sure how to do it, but you should be able to group these all into one statement so that your string only needs to be parsed once for all of the characters instead of once for each one. I don't have a lot of time to think about it right now, but I'm sure one of the other members could help you with that. [sig] Sincerely, <a href=mailto: > </a> <a href=

http://www.oac-design.com>Tom

Anderson</a> CEO, Order amid Chaos, Inc.

http://www.oac-design.com

[/sig]

rtb1 · Oct 8, 2000

I got this piece of code from this forum and I guess it was necessary for that example to escape the replacement character.

Thank you for correcting this,

Raoul [sig][/sig]

Guest_imported · Oct 8, 2000

#!perl.exe
$one = "reg ex";
$two = "I would like to know more about regular expressions";
#print "$one\n$two\n";
@seperate_words = split /\W/, $one;
foreach $word (@seperate_words)
{
$regex = $regex . $word . "(\\w)*(\\W)*";
# print "$word\n";
}
#print "\n\n$regex\n";
if( $two =~ m/$regex/ )
{
print "Yep\n";
}
else
{
print "Nope\n";
}

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

regex question ...

rtb1

IS-IT--Management

MikeLacey

MIS

goBoating

Programmer

rtb1

IS-IT--Management

MikeLacey

MIS

tanderso

IS-IT--Management

rtb1

IS-IT--Management

tanderso

IS-IT--Management

rtb1

IS-IT--Management

tanderso

IS-IT--Management

rtb1

IS-IT--Management

Guest_imported

New member

Similar threads

Part and Inventory Search

Sponsor