Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Extracting text from a server user eMail file

Status
Not open for further replies.

slolerner

Programmer
Sep 15, 2004
9
US
I am intrigued by how really difficult it is to do "Practical Extract and Report Language" tasks.

This little script extracts the plain text from the body of a users's eMail msg on my server and works fine when there is only one message.

The thing needed now is that since each new message is appended to the file in the server /var/email/ I need to get to the LAST messsage to do the extraction.

How do I instruct the script to find the LAST instance of the text for splitting? Or, can the file be read from the end up? Could all messages but the last one be deleted by the script? (After the first call, there would be only the one message in the file, don't want that deleted until users sends another).
thx,
mike
###
code:
#!/usr/bin/perl

$filename2 = "/usr/local/etc/httpd/htdocs/dancewithdebbie/email.txt";
$filename = "/var/mail/debbienl";
$redirect = "
open(FILE,"$filename");
@lines = <FILE>;
close(FILE);

$start=0;$finish=0;
open(WRITE,">$filename2") || die "Can't open $filename2!\n";
foreach (@lines) {
chomp; #now you can ignore the \n
if ($_ eq "Dance with Debbie") {
$start=1;
}
if ($_ eq "Content-Type: text/html;") {
last;
}
if ($start ==1) {
print WRITE "$_\n"; #add it back in
}
}
close (WRITE);
print "Location: $redirect\n\n";

Before you criticize anyone, walk a mile in their shoes...
That way, when you criticize them, you're a mile away and, you have their shoes
 
I may be underestimating exactly what you're doing but when you create the list of messages, make sure you have a unique split in there somewhere (I'm sure you probably already do). When you come to reading your list can't you use the reverse() method? Simply use use reverse(@lines) before carrying out your foreach loop, then the the file contents will get loaded in backwards! Although I can't remember if the actual contents of the array elements also get reversed, but I'm sure you can figure that out.

Rob Waite
 
Hello Rob!
Thanks so much for responding!
Yes, the reverse(@lines) scheme has been tried.
First let me say that I'm dealing with the receiving eMail folder on the server. This means that each incoming message is appended to the existing file, including header data lines and the html rendition of the body text.
All I need/want is the plain text lines of the message body.
That is, I need to not only output the LAST message body text received, I need to strip the header data lines AND strip the html text following the plain text. Identifiers are available for the start of the message body text and either for the end of the message body text or the start of the superfluous html text.

So the reverse(@lines)code was added, and it worked, sort of.
What may have been the correct text was output, but it was immediately garbled and then the second split, to skip the rest of the message did not happen, the output ran on to the end of the file.

So even if the reverse lines operation is made to work, it seems to me that code could be written to find the LAST instance of an indentifier, split the file, and print everything up to the next instance of a second identifier.

Since PERL was written a P ractical E xtractrion and R eport L anguage, why is this such a difficult task?

Thanks for any input you may have on this thorny question.
mike, the ever so slo, lerner.

Before you criticize anyone, walk a mile in their shoes...
That way, when you criticize them, you're a mile away and, you have their shoes
 
It sounds as though you are over complicating it a little. Your data may be easier to handle as one single string rather than handling each of the lines of the file through an array. Don't forget you can reverse the text backwards and forward as many times as you like, i.e. if you do two reverses you end up with the text facing the right way again. I would try:
Code:
#!/usr/bin/perl

$filename2 = "/usr/local/etc/httpd/htdocs/dancewithdebbie/email.txt";
$filename = "/var/mail/debbienl";
$redirect = "[URL unfurl="true"]http://www.dcdancenet.com/dancewithdebbie/news2.sh...";[/URL]

open(FILE,"$filename");
@lines = <FILE>;
foreach $line (@lines)  {
	$string = $string.$line;
}
close(FILE);

$string = reverse($string);

($etc,$string) = split/\;lmth\\txet :epyT-tnetnoC/i,$string;
($string,$etc) = split/eibbeD htiw ecnaD/i,$string;

open(WRITE,">$filename2") || die "Can't open $filename2!\n";
print WRITE reverse($string);
close (WRITE);

print "Location: $redirect\n\n";

This gives you the oportunity to split the emails up if there is more than one and simply put them in an array if there's a unique split between each message:
Code:
@emails = split/\;lmth\\txet :epyT-tnetnoC/i,$string;


Rob Waite
 
Hi Rob!
Yes, thanks so much for your interest!
The thing is, that, I have no control over the incoming messages.
If I could get the sender to work with me, I wouldn't even need this approach.
The messages do not contain convenient "unique" identifiers such as you mention.
It does seem that handling as one long string makes sense as each message has the same "idenitifiers" --
that is, there is that same bit of text preceeding and following the text body of each message.
If there is only one message, no problem.
If there are more than one, I somehow, simply or complicatedly, need to skip over all but the LAST instance of the identifier text to begin the split.

Surely there is a way to code a script such that it that searches for all the instances of some bit of code and then prints the text between that last instance and another specified point?
Is this what your code is doing?
thx
mike

Before you criticize anyone, walk a mile in their shoes...
That way, when you criticize them, you're a mile away and, you have their shoes
 
This is basically what my code is doing, so long as there is some kind of unique identifier you can split the string wherever you want. If you use an array you can find the last instance by doing the following, without using a reverse even:
Code:
@array = split/$unique_identifier/i,$string;
$array_count = scalar(@array);
$last_instance = @array[arr_count];
even if there's only one entry in there, so long as you pick the right unique identifier then the last instance will always come up...fingers crossed!


Rob Waite
 
Rob:
THANKS! so much for your gracious help!
In addition to being a really slolerner my host server has gone out, (hurricane effect, I guess) so I have not been able to grope my way thru your reverse code example, which I see now is clear simpler than my hurried impression as I flailed away at getting the connection back to my server.

Yes, the client puts the same intial text on each msg and the same line of text starts the mindless MSFT html code after the plain text, so your array approach should work to extract from the last iteration of the identifier text?
Thx for working so late in the day for you.
mike

Before you criticize anyone, walk a mile in their shoes...
That way, when you criticize them, you're a mile away and, you have their shoes
 
Rob!
Thx!
Getting there!
First try nothing was printed.
I changed/shortened the first id, appears some problem writing the escape characters backward, I don't really know.
Any way, the script now prints out the text that is wanted, only, it is still written backwards!
L@@ks like the reverse in the write section is not being implemented??
mike, slooowly lerning.......

Code:
#!/usr/bin/perl

$filename2 = "/usr/local/etc/httpd/htdocs/dancewithdebbie/email.txt";
$filename = "/var/mail/debbienl";
$redirect = "[URL unfurl="true"]http://www.dcdancenet.com/dancewithdebbie/news2.shtml";[/URL]

open(FILE,"$filename");
@lines = <FILE>;
foreach $line (@lines)  {
    $string = $string.$line;
}
close(FILE);

$string = reverse($string);


($etc,$string) = split/:epyT-tnetnoC/i,$string;
($string,$etc) = split/eibbeD htiw ecnaD/i,$string;

open(WRITE,">$filename2") || die "Can't open $filename2!\n";
print WRITE reverse($string);
close (WRITE);

print "Location: $redirect\n\n";


Before you criticize anyone, walk a mile in their shoes...
That way, when you criticize them, you're a mile away and, you have their shoes
 
Rob!
UR a Pince!
ust so you'll know, even being a ral slo lerner, I already did just that and was so pleased that it worked!

Now that it works, I see that it is criminally SLOW,
when there is an accumulation of messages in that folder.
Keeping in mind that my host has installed only some of perl 5* and will not set permissions for me to tamper with my perl set-up, what would be your idea for speeding up the process?
Looking at the array code you provided above (if that's faster) it appears that the second split is missing, so any superflous text on that message and or even entire additional unwanted messages would print??
Trying your patience to the max,
I remain mike,
the grateful but still real slo lerner


Before you criticize anyone, walk a mile in their shoes...
That way, when you criticize them, you're a mile away and, you have their shoes
 
It could simply be slow because your host is running slow, and there is nothing really that can be done about this unfortunately. I've actaully managed to come up with another approach. This will automatically give you the final message in the list and doesn't involve reversing!! Will explain after:
Code:
#!/usr/bin/perl

$filename2 = "/usr/local/etc/httpd/htdocs/dancewithdebbie/email.txt";
$filename = "/var/mail/debbienl";
$redirect = "[URL unfurl="true"]http://www.dcdancenet.com/dancewithdebbie/news2.shtml";[/URL]

open(FILE,"$filename");
@lines = <FILE>;
foreach $line (@lines)  {
    $string = $string.$line;
}
close(FILE);
This first section writes the code in as normal.
Code:
@messages = split/Dance with Debbie/i,$string;
$message_count = scalar(@messages);
$last_message = $message[$message_count];
This creates an array of all the messages, counts them and then creates a string of the last element in the array.
Code:
($text_u_want,$etc) = split/content-type:/i,$string;
finds the end of the last message
Code:
open(WRITE,">$filename2") || die "Can't open $filename2!\n";
print WRITE $text_u_want;
close (WRITE);
This then prints out the text you want from the last message of the array you created and doesn't involve any confusing reverses. It also means if you want to operate on other messages in the array then you can using a 'foreach' loop for example. I hope that's simplified it a little!!


Rob Waite
 
THANK YOU ROB!
Addressing what appears to be your last question/request to "add in a sample from the original email file again: /var/mail/debbienl"..
I am glad to do so, but wonder whether I understand what you want me to do, keeping in mind that each of these messages with all the html markup and the plain text run 80,000 bytes,
(which is part of the reason for extracting the plain test only from the one message.
Wondering whether to paste in a complete set of msg's or some sort of edited version? Or send it via kprivate eMail or post it on my site or???
thx
mike



Before you criticize anyone, walk a mile in their shoes...
That way, when you criticize them, you're a mile away and, you have their shoes
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top