Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing extraneous info from emails

Status
Not open for further replies.

yim11

MIS
Jun 26, 2000
35
US
Hello. I have been given a Perl script that [supposedly] strips extraneous info from emails before they are inserted into a database as comments. The script works GREAT with mail sent from lotus notes, not at all with messages from Outlook 98, and barely with mail sent from Outlook Express.<br>What I am looking for is _any_ information that will explain how the extra info is being removed, so that I can customize the script to format all mail the same way. Included below is the section of script that removes the info from the message, as well as a copy of a 'processed' message sent via Outlook Express.<br>TIA for any help!!<br>Jim<br>-----------Begin Code---------------<br># We need to remove the extraneous information from the <br>#&nbsp;&nbsp;addressing fields <br>&nbsp;&nbsp;&nbsp;&nbsp;@tmp=split('&quot;',$message{'From'});<br>&nbsp;&nbsp;&nbsp;&nbsp;$message{'From'}=$tmp[1];<br>&nbsp;&nbsp;&nbsp;&nbsp;<br>&nbsp;&nbsp;&nbsp;&nbsp;if ($message{'To'}=~m&quot;&lt;&quot;) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp=split(',',$message{'To'});<br><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;foreach(@tmp) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp2=split('&lt;',$_,2);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp3=split('&gt;',$tmp2[1],2);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$_=@tmp3[0];<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$message{'To'}=join(',',@tmp);<br>&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;if ($message{'cc'}) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp=split(',',$message{'cc'});<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;foreach(@tmp) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp2=split('&lt;',$_,2);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp3=split('&gt;',$tmp2[1],2);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$_=@tmp3[0];<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$message{'cc'}=join(',',@tmp);<br>&nbsp;&nbsp;&nbsp;&nbsp;} else {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$message{'cc'}=&quot; &quot;;<br>&nbsp;&nbsp;&nbsp;&nbsp;}<br># Date needs to be corrected to the dd-Mon-yyyy format<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'Date'}=~s&quot;([0-9]) (...) ([0-9])&quot;$1-$2-$3&quot;);<br>&nbsp;&nbsp;&nbsp;&nbsp;@tmp=split(' ',$message{'Date'});<br>&nbsp;&nbsp;&nbsp;&nbsp;$message{'Date'}=$tmp[1];<br># Contents and subject must be escaped<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'Subject'}=~s&quot;'&quot;''&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'Subject'}=~s&quot;`&quot;``&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'contents'}=~s&quot;'&quot;''&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'contents'}=~s&quot;`&quot;``&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'contents'}=~s&quot;\\&quot;\\\\&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;@tmp=split(' ',$message{'Subject'});<br>&nbsp;&nbsp;&nbsp;&nbsp;$catno=&quot;&quot;;<br>&nbsp;&nbsp;&nbsp;&nbsp;foreach(@tmp) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if ($_=~m&quot;[0-9]-&quot;) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$catno=$_;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;}<br>----End Code---------------------<br>Results from a processed Outlook Express message:<br>------Begin results--------------<br>multipart/alternative; boundary=&quot;----=_NextPart_000_0005_01BFECC9.1AF66440&quot; X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 <br><br>This is a multi-part message in MIME format. <br><br>------=_NextPart_000_0005_01BFECC9.1AF66440 Content-Type: text/plain; charset=&quot;iso-8859-1&quot; Content-Transfer-Encoding: quoted-printable <br><br>THis test message=20 sent via Outlook Express <br><br>------=_NextPart_000_0005_01BFECC9.1AF66440 Content-Type: text/html; charset=&quot;iso-8859-1&quot; Content-Transfer-Encoding: quoted-printable <br><br><br>THis test message sent via Outlook = Express<br>------=_NextPart_000_0005_01BFECC9.1AF66440-- <br>-----End results--------------<br>
 
# We need to remove the extraneous information from the <br>#&nbsp;&nbsp;addressing fields<br><font color=red># looks like you have an associative array called %message already set up<br># next two lines grab the first &quot;From&quot; address from the From list</font><br>&nbsp;&nbsp;&nbsp;&nbsp;@tmp=split('&quot;',$message{'From'});<br>&nbsp;&nbsp;&nbsp;&nbsp;$message{'From'}=$tmp[1];<br><br><font color=red># this section splits up the &quot;To&quot; addresses</font><br><font color=red># if there's a &lt; character in there</font><br>&nbsp;&nbsp;&nbsp;&nbsp;if ($message{'To'}=~m&quot;&lt;&quot;) {<br><font color=red># make an array (@tmp) containing each address</font><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp=split(',',$message{'To'});<br><font color=red># then - for each address</font><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;foreach(@tmp) {<br><font color=red># chop off the &lt; and &gt; characters</font><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp2=split('&lt;',$_,2);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp3=split('&gt;',$tmp2[1],2);<br><font color=red># and save the bare email address (without the &lt; and &gt; characters)</font><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$_=@tmp3[0];<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br><font color=red># save @tmp in the &quot;to&quot; element of %message</font><br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$message{'To'}=join(',',@tmp);<br>&nbsp;&nbsp;&nbsp;&nbsp;}<br><font color=red># same - but for the &quot;cc&quot; element (hmmm, should that be &quot;Cc&quot;?)</font><br>&nbsp;&nbsp;&nbsp;&nbsp;if ($message{'cc'}) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp=split(',',$message{'cc'});<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;foreach(@tmp) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp2=split('&lt;',$_,2);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;@tmp3=split('&gt;',$tmp2[1],2);<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$_=@tmp3[0];<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$message{'cc'}=join(',',@tmp);<br>&nbsp;&nbsp;&nbsp;&nbsp;} else {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$message{'cc'}=&quot; &quot;;<br>&nbsp;&nbsp;&nbsp;&nbsp;}<br># Date needs to be corrected to the dd-Mon-yyyy format<br><font color=red># matches date fields and reformats</font><br><font color=red># not sure if that first match [0-9] will work as it should match two digits, not one</font><br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'Date'}=~s&quot;([0-9]) (...) ([0-9])&quot;$1-$2-$3&quot;);<br>&nbsp;&nbsp;&nbsp;&nbsp;@tmp=split(' ',$message{'Date'});<br>&nbsp;&nbsp;&nbsp;&nbsp;$message{'Date'}=$tmp[1];<br># Contents and subject must be escaped<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'Subject'}=~s&quot;'&quot;''&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'Subject'}=~s&quot;`&quot;``&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'contents'}=~s&quot;'&quot;''&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'contents'}=~s&quot;`&quot;``&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'contents'}=~s&quot;\\&quot;\\\\&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;@tmp=split(' ',$message{'Subject'});<br>&nbsp;&nbsp;&nbsp;&nbsp;$catno=&quot;&quot;;<br>&nbsp;&nbsp;&nbsp;&nbsp;foreach(@tmp) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if ($_=~m&quot;[0-9]-&quot;) {<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;$catno=$_;<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br>&nbsp;&nbsp;&nbsp;&nbsp;}<br>----End Code---------------------<br><br>run out of time - sorry - baby needs to be fed!<br> <p>Mike<br><a href=mailto:michael.j.lacey@ntlworld.com>michael.j.lacey@ntlworld.com</a><br><a href= Cargill's Corporate Web Site</a><br>
 
Thank you very much!!!<br>You are a great help! Hope the baby is full :)<br>Can you shed a little light on the following section as I'm almost sure this is where my problem is.<br>--------Begin Code--------------<br># Contents and subject must be escaped<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'Subject'}=~s&quot;'&quot;''&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'Subject'}=~s&quot;`&quot;``&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'contents'}=~s&quot;'&quot;''&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'contents'}=~s&quot;`&quot;``&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;$result=($message{'contents'}=~s&quot;\\&quot;\\\\&quot;g);<br>&nbsp;&nbsp;&nbsp;&nbsp;@tmp=split(' ',$message{'Subject'});<br>&nbsp;&nbsp;&nbsp;&nbsp;$catno=&quot;&quot;;<br>-------End Code-----------------<br><br>TIA!!!<br>Jim<br>
 
baby's fine &lt;grin&gt;<br><br>let us know how you get on with the script, this kind of thing is a devil to test<br><br><FONT FACE=monospace><b><br># Contents and subject must be escaped<br><font color=red># this section &quot;escapes&quot; certain characters so that they will insert</font><br><font color=red># correctly into the database - you can't just insert a string</font><br><font color=red># like 'Michael's', it has to be 'Michael''s'</font><br><font color=red># use / rather than &quot; as the match character - much easier to read</font><br><font color=red># replace all ' with '' in the 'Subject' element of %message</font><br>$result=($message{'Subject'}=~s/'/''/g);<br><font color=red># replace all ` with `` in the 'Subject' element of %message</font><br>$result=($message{'Subject'}=~s/`/``/g);<br><font color=red># replace all ' with `` in the 'contents' element of %message</font><br>$result=($message{'contents'}=~s/'/''/g);<br><font color=red># replace all ` with `` in the 'contents' element of %message</font><br>$result=($message{'contents'}=~s/`/``/g);<br><font color=red># replace all \ with \\ in the 'contents' element of %message</font><br>$result=($message{'contents'}=~s/\\/\\\\/g);<br><font color=red># put each word in 'Subject' into the elements of the @tmp array</font><br>@tmp=split(' ',$message{'Subject'});<br></font></b><br> <p>Mike<br><a href=mailto:michael.j.lacey@ntlworld.com>michael.j.lacey@ntlworld.com</a><br><a href= Cargill's Corporate Web Site</a><br>
 
&gt; let us know how you get on with the script, this kind of &gt; thing is a devil to test<br>Your not kidding there! The main problem is the script works great on messages sent via Lotus Notes (v4.6a) but doesnt strip out enough on messages sent via Outlook Express, and strips so much on messages sent via Outlook 98 that they never show up! This is a very difficult project. Your help has been invaluable. Trying to find examples or resources for this type of script is rare at best. Thanks again, and if you have any other thoughts please let me know, otherwise I'll let ya know how it turns out.<br>Thanks!<br>Jim
 
&lt;smile&gt; not the gentlest of intriductions to Perl<br><br>If you're developing this script on a PC you should have a look at:<br><br><A HREF=" TARGET="_new"> little development environment.<br> <p>Mike<br><a href=mailto:michael.j.lacey@ntlworld.com>michael.j.lacey@ntlworld.com</a><br><a href= Cargill's Corporate Web Site</a><br>
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top