Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

I'm getting back a lot of returned

Status
Not open for further replies.

gm199

Programmer
Aug 9, 2001
37
US
I'm getting back a lot of returned emails (user unknown) from my maillist subscribers, so it's not a good idea to clean the database up by hand.

The returned emails are in a database (in just one big file) and I want to extract only the email part from following lines, no matter what His name is:

To: &quot;His name&quot; <xxxxx@foo.bar>

How can I find this since there is a lot of < and > and To: but is the only line that contains To:&quot;His/her name&quot;?


I end up nowere trying this:

open(FILE,&quot;EmailFile&quot;);
while (<FILE>) {
if (/To:\&quot;/ .. />/) { #match line markers
push(@lines,$_);
}
}

foreach $i (@lines) {
#what regex to put here??????
}

Thaks for any help
 
GM199,

Follow the link to find some example code from 'Mastering Regular Expressions' book by Jeffrey Friedl. Somewhere in the code should be the regex to extract the real email address. Happy hunting ;-)


Cheers, NEIL
 
Tks toolkit for your reply.

The samples at the site don't fullfill my needs.

The problem is that I have many email address but just the line To:&quot;junk&quot; <email>
have what I need.

My question is how to identify that the line have (regex)

To: + &quot;name parte is junk&quot; + < + email + >

not To: <email>
not From: &quot;name part&quot; <email>

To: is always the same
&quot;name part&quot; change every single line to match and discharge this is my problem
< and > is always the same
 
Is that correct to match To: &quot;anything&quot; <email> ?

To:\s\&quot;(.|\n)*\&quot;\s<(.*)>
| | | | | | | | | end
| | | | | | | | my match $1
| | | | | | | begining
| | | | | | space
| | | | | &quot; scaped
| | | | anything after
| | | the part to erase - junk
| | &quot; scaped
| space
To: part
 
I'd make a couple of corrections to that regex, and two comment: * doesn't mean &quot;anything after&quot;, it means &quot;the preceeding match (char or group) 0 or more times, and you shouldn't need to escape the quotes unless you use quotes as your regex delimiter. For the corrections, I'd make it so that there could be 1 or more spaces, rather than just one, I'd use non-greedy repetitions (*? vs *), and I'd at least make sure that there was an @ between the < >. I'ts a more complicated regex, but it should be more flexible and reliable. Also, I don't think you need to check for \n in the name part, since I don't believe mail programs will allow a \n in an email To: line.
Code:
/^To:\s+&quot;.+?&quot;\s+<.+?@.+?>/
Description:
^ start of line
To: literal
\s+ one or more spaces
&quot; literal
.+? one or more chars (non-greedy)
&quot; literal
\s+ one or more spaces
< literal
.+? one or more chars (non-greedy)
@ literal
.+? one or more chars (non-greedy)
> literal
Tracy Dryden
tracy@bydisn.com

Meddle not in the affairs of dragons,
For you are crunchy, and good with mustard.
 
Here is the solution I got.
Maybe someone will need it too




$address_file = &quot;/home/httpd/test/mails.mbx&quot;;

# Get the addresses.
open(LIST, $address_file);
@addresses = <LIST>;
close(LIST);

foreach $email (@addresses)
{
chomp ($email);
if ($email =~ /^To:\s+&quot;.+?&quot;\s+<(.*)>/i) {
print &quot;$1\n&quot;;
}
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top