Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Pattern matching 1

Status
Not open for further replies.

hbutt

Programmer
Apr 15, 2003
35
GB
hi,
i got a problem with my code i am using for email validation. its not exactly standard email validation, as i need to match incomming email addresses so they are the same as any of the following email formats or a mixture of them:

"a.kurt@soton.ac.uk" or "a.kurt@soton.com"

"a dot kurt at soton dot ac dot uk" or

"a dot kurt at soton dot com"

this is what ive got so far:

$email=~ /(((\w{1,}?)(\.| dot )){1,}[\@| at ]((\w{1,}?)(\.| dot )){1,}[a-zA-Z]{2,4})/

i'm having problems after the @/at of the email address as the code only accepts the "ac.uk" format and not the ".com" format.


 
You were right that was weird, but i think i got it. Here is what i got:
$email=~ /(((\w+)[\.| dot ])+[\@| at ]((\w+)[\.| dot ])+[a-zA-Z]{2,3})

Essentially i think the problem was that u had (\.| dot ) but they should be in '[' (square) brackets i think that fixed, oh i also changed your {1,} to + since those are equivalent i believe but use whatever u prefer ;)

--Marty

--Computable or not Computable that is not the question
char *p="char *p=%c%s%c;main(){printf(p,34,p,34);}";main(){printf(p,34,p,34);}
 
The way I approach complex regular expressions is to build them up in steps, then you can easily see what is going on. It also makes fixing parts of the regexp much easier.

Here is my code, the test file and test results. The first refinement would be to restrict the domain to either "com|uk" say :)
Code:
#!/bin/perl -w
use strict;

my $t1 = "(\\.| dot )";           # a literal dot or the word surrounded with space
my $t2 = "(@| at )";              # a literal "at" or the word surrounded with space
my $name = "(\\w+$t1)+\\w+";      # (word dot)+word
my $addr = "(\\w+$t1)+\\w+";      # ditto
my $email= "$name$t2$addr";       # name at address
print "regex=$email\n";           # just checking :)

while ( <> ) {
    print if /$email/o;
}

Test File:
a.kurt@soton.ac.uk
a.kurt@soton.com
a dot kurt at soton dot ac dot uk
a dot kurt at soton dot com
fred
this is a test
dot foo
at bar

Results:
regex=(\w+(\.| dot ))+\w+(@| at )(\w+(\.| dot ))+\w+
a.kurt@soton.ac.uk
a.kurt@soton.com
a dot kurt at soton dot ac dot uk
a dot kurt at soton dot com
 
Another way to create a regex is to use the /x modifier:

my @list = (
'a.kurt@soton.ac.uk',
'a.kurt@soton.com',
'a dot kurt at soton dot ac dot uk',
'a dot kurt at soton dot com',
'fred',
'this is a test',
'dot foo',
'at bar',
);

foreach my $item (@list) {
print &quot;$item\n&quot; if($item =~ m/(
((\w+)[\.| dot ])+ # match 1 or more alphanumerics followed by a dot
[\@| at ] # match the at character
((\w+)[\.| dot ])+ # match 1 or more alphanumerics followed by a dot
[a-zA-Z]{2,3}) # match the domain type or country code
/x);
}

Returns:
a.kurt@soton.ac.uk
a.kurt@soton.com
a dot kurt at soton dot ac dot uk
a dot kurt at soton dot com


The /x allows you to add whitespace and comments without affecting the regex. Useful for debugging and remembering a complex regex a few months or more later :)



Barbie
Leader of Birmingham Perl Mongers
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top