Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations John Tel on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extracting emails using a regex 2

Status
Not open for further replies.

vcherubini

Programmer
May 29, 2000
527
US
Hello:

I have found out how to extract emails with some code from the Perl Cookbook, but I can not get it to work.

What I would like to know how to find all of the email addresses in a file and replace the email address with a link that says:

[tt]
<a href=&quot;mailto:email@emailaddress.com&quot;>email@emailaddress.com</a>
[/tt]

I know that this would be a pretty complicated regex, but any help is appreciated.

Thanks,

-Vic vic cherubini
malice365@hotmail.com
====
Knows: Perl, HTML, JavScript, C/C++, PHP, Flash, Director
====
 
s/(\b.+?\@.+?\..+?\b)/<a href=&quot;$1&quot;>$1<\/a>/g;

will get the general format of email addresses. adam@aauser.com
 
Yeah I saw that, and added it.

Thanks for all the help though.

-Vic vic cherubini
malice365@hotmail.com
====
Knows: Perl, HTML, JavScript, C/C++, PHP, Flash, Director
====
 
One last question if you don't mind.

I modified the expression to say:

[tt]
s/(\@.*?\..+?\b)/<a href=&quot;mailto:$1&quot;>$1<\/a>/g;
[/tt]

I don't think that I made myself clear in my first post.

Say I have a string that says:

[tt]
$data = &quot;If you want to email me, please do so at me\@myaddress.com&quot;;
[/tt]

What I want to do is extract the me@myaddress.com out and place it in the [tt]<a href=&quot;mailto:me@myaddress.com>me@myaddress.com</a>.[/tt]

With the modified expression, I am getting the @myaddress.com part only, and not the me part. My logic would to find the @ sign and trace backwards until you find a space and then delete the space and add all of the characters that were found until the space. Is this at all possible or am I making it too hard?

Thanks for all the help, though.


-Vic vic cherubini
malice365@hotmail.com
====
Knows: Perl, HTML, JavScript, C/C++, PHP, Flash, Director
====
 
Look at the difference between their code and yours (their's first):
[tt]
s/(\b.+?\@.+?\..+?\b)/<a href=&quot;mailto:$1&quot;>$1<\/a>/g;
s/(\@.*?\..+?\b)/<a href=&quot;mailto:$1&quot;>$1<\/a>/g;
[/tt]
You made two changes. One is you changed the first &quot;+&quot; after the @ into a &quot;*&quot;. That doesn't really matter. Second, you left out the initial &quot;\b.+?&quot;, which is the part that will match the &quot;me&quot; of &quot;me@myaddress.com&quot;.

&quot;If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito.&quot;
 
Stillflame:

I tried that at first, and if I have a string that says:

[tt]
$string = &quot;This is a string with an email in it me\@myaddress.com&quot;;
$string=~ s/(\b.+?\@.+?\..+?\b)/<a href=&quot;mailto:$1&quot;>$1<\/a>/g;
print &quot;$string\n&quot;;
[/tt]

Perl will print:

[tt]
<a href=&quot;mailto:This is a string with an email in it me@myaddress.com>This is a string with an email in it me@myaddress.com</a>
[/tt]

I see the logic of the regex. That the first \b matches the first word boundry (the me part of the address), the \@. matches the @. in the email and the last \b matches the last word boundry (the com part of the email address).

Could there possibly be somthing wrong with my version of perl? I am running Activestate's version on Windows ME. Has there been any reported cause of a mishap in the regex engine for Activestate?

I was sitting at my computer till 12 last night with Programming Perl in my lap and The Perl Cookbook on my desk and I couldn't, for the life of me, get the regex to work. It looks so logical, and seems that it would work, but doesn't.

Thank you for your time and dedication.


-Vic

P.S. And I have spent the night in a closed tent with a mosquito. I am in Boy Scouts. =) vic cherubini
malice365@hotmail.com
====
Knows: Perl, HTML, JavScript, C/C++, PHP, Flash, Director
====
 
the regex should work... i haven't had any problems with activestate, but, i haven't run it on ME. adam@aauser.com
 
I've played with a lot of regex stuff with activeState's ports and on several UNIX platforms. The regex engine is rock solid. It is one of Perl's long demonstrated strengths. .....Humans trying to figure out how to use regex's..... now that is another thing (including me) ;^)

Almost invariably, when I can't get them to work, it is some simple assumption that I have made with out realizing that I've made an assumption.

I'm sure it is doing what you are asking it to do.




keep the rudder amid ship and beware the odd typo
 
No, he's right, it does match that whole string. The first
&quot;\b&quot; hits right before the first character of the string,
then the &quot;.+?&quot; matches the rest of the string up to the &quot;@&quot;
symbol. He's going to need to get the actual regex that
matches valid email addresses. I've only seen it on
someone else's computer, and i don't know where he found
it, but it was in a module. The actual regex was
aproximately 30 lines of code, with really nasty zero-width lookaheads in it, but as i started to work out how to write
the correct regex, i realised that they were needed. It may
be in CGI, i'll start looking, but if anybody knows exactly
where it's at, it would be greatly appreciated.

&quot;If you think you're too small to make a difference, try spending a night in a closed tent with a mosquito.&quot;
 
Thanks for all the help, guys!

I checked CPAN today for something that does this, and found something, but the regex was 100 lines of code (well 94 to be exact). How does one know how to write that?

Anyways, I have temporarily fixed it with the following code. It may not be the best way, in fact its probably pretty cryptic, but it works:

[tt]
$data = &quot;my name is vic and my email address is vikter@epicsoftware.com and here is some more text;


$data =~ s/\B//g;
@data = split(/ /,$data);

open(FILE,&quot;>>emails.txt&quot;) || die(&quot;failed to open file: $!&quot;);
foreach $var (@data) {
if ($var =~ /^.+\@(\[?)[a-zA-Z0-9\-\.]+\.([a-zA-Z]{2,3}|[0-9]{1,3})(\]?)$/) {
print FILE &quot;<a href=\&quot;mailto:$var\&quot;>$var</a>\n&quot;;
} else {
print &quot;$var\n&quot;;
}
}
[/tt]

The code splits all of the words in the $data variable into different members of the @data array. Then the foreach loop goes through the array, checks with a regex if the emails are valid email addresses and if so, prints them to a file, and if it isn't an email address, it does not print it to the email, instead it prints it to the screen.


Thanks for all the help though, and if you find anything on how to do this better, I would love to know.

Thanks again,

-Vic vic cherubini
malice365@hotmail.com
====
Knows: Perl, HTML, JavScript, C/C++, PHP, Flash, Director
====
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top