Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Perl script -> Applescript

Status
Not open for further replies.

duncdude

Programmer
Jul 28, 2003
1,979
GB
I have a spam filter that I have written that I would like to work with Outlook Express and I would really like to get it implemented... can anyone help?

come on clever dudes - I really need your help!

To elaborate - I get about 100 spam emails a day and I have come up with this:-

Code:
@spam_words = (
               "viagra",
               "paris hilton",
               "vicodin",
               "teen movies",
               "human growth hormone",
               "xanax",
               "valium",
              );
               
foreach $spam_word (@spam_words) {
  $filter = join ("[^a-z]*", split(//, $spam_word));
  push (@spam_filters, $filter);
}

@emails = (
           'you can get v/i/a/g/r/a for only $10 per tablet!',
           'Low Everyday Prices on Vicódin',
           'you can get a date with P*a*r*i*s H*i*l*t*o*n!!!',
           'WITH H_U_M_A_N G_R_O_W_T_H H_O_R_M_O_N_E DIETARY THERAPY !!!',
           'Cc: 1/2 off valíum, xãnax. - Delivered Overnight',
           'Welcome to our F.r.e.e T.e.e.n M.o.vie.s Newsletter Issue # 9',
          );
          
foreach $email (@emails) {

  $email =~ s|ç|c|;
  $email =~ s|ó|o|;
  $email =~ s|ã|a|;
  $email =~ s|í|i|;
  
  print "$email\n";
  print "—" x length ($email) . "\n";
  foreach $spam_filter (@spam_filters) {
    if ($email =~ m|$spam_filter|i) {
      print "SPAM : $spam_filter\n";
    } else {
      print "o.k. : $spam_filter\n";
    }
  }
  print "\n";
}

It might not be rocket science but it sure would catch almost all of my rubbish emails - and it is so simple to expand - so I am desperate to get it working!


Kind Regards
Duncan
 
Why don't you use a file with spamwords. More easy to maintain plus the fact that it's more easy to built addons (like a small tool to add or delete an entry.
 
thanks uida1154 - I certainly will do that - this is only an outline of what I want to do - hence the sample e-mail messages are in the script as well and they certainly wouldn't be in the final script


Kind Regards
Duncan
 
looking at it, you can substitute all special characters to nothing, for example (your own) _, /, \, |, etc.

I have tried to run your code, it gives a lot of mess...
you can get v/i/a/g/r/a for only $10 per tablet!

Code:
ùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùùù
SPAM : v[^a-z]*i[^a-z]*a[^a-z]*g[^a-z]*r[^a-z]*a
o.k. : p[^a-z]*a[^a-z]*r[^a-z]*i[^a-z]*s[^a-z]* [^a-z]*h[^a-z]*i[^a-z]*l[^a-z]*t
[^a-z]*o[^a-z]*n
o.k. : v[^a-z]*i[^a-z]*c[^a-z]*o[^a-z]*d[^a-z]*i[^a-z]*n
o.k. : t[^a-z]*e[^a-z]*e[^a-z]*n[^a-z]* [^a-z]*m[^a-z]*o[^a-z]*v[^a-z]*i[^a-z]*e
[^a-z]*s
o.k. : h[^a-z]*u[^a-z]*m[^a-z]*a[^a-z]*n[^a-z]* [^a-z]*g[^a-z]*r[^a-z]*o[^a-z]*w
[^a-z]*t[^a-z]*h[^a-z]* [^a-z]*h[^a-z]*o[^a-z]*r[^a-z]*m[^a-z]*o[^a-z]*n[^a-z]*e

o.k. : x[^a-z]*a[^a-z]*n[^a-z]*a[^a-z]*x
o.k. : v[^a-z]*a[^a-z]*l[^a-z]*i[^a-z]*u[^a-z]*m
 
There is already a popfile based perl plugin for Outlook. Or are you doing this as a learning excercise for yourself?
 
i'm sorry uida1154 - I don't quite understand what you are saying?

the script takes keywords from the @spam_words array and 'builds' regular expressions based on the entries

for example it will turn a simple word like viagra into v.*(anything)i.*(anything)a.*(anything)g.*(anything)r.*(anything)a

it does this because the !@£$%^# people who send spam messages never write Viagra nicely for us to easily find and remove - instead they write it as V*i*a*g*r*a or Viágra or Viaaaaagra - making it much more difficult to catch

This script will catch any of the keywords - no matter how screwed up the words are


Kind Regards
Duncan
 
OK, then I misunderstood one and another.

mainwhile I was hacking a bit. Maybe the following can be of use? Afcourse another file (I love using files) containing what should be replaced by what...

Code:
open SUBSTITUTE, "<replacelist.txt" || die "can not open replacelist\n";
my %replacements;
while (<SUBSTITUTE>)
{
    my $first = substr($_, 0,1);
    my $second = substr($_,1,1);
    $replacements{$first} = $second;
}
close SUBSTITUTE;

...
foreach my $key (%replacements)
{
    $email =~ s/$key/$replacements{key}/;
}
 
#!/usr/bin/perl -w

@spam_words = (
"viagra",
"vicodin",
"parishilton",
"teenmovies",
"humangrowthhormone",
"xanax",
"valium",
);
foreach $spam_word (@spam_words) {

$filter = lc($spam_word);
$filter =~s/\s//g;
push (@spam_filters, $spam_word);
}

@emails = (
'you can get v/i/a/g/r/a for only $10 per tablet!',
'Low Everyday Prices on Vicódin',
'you can get a date with P*a*r*i*s H*i*l*t*o*n!!!',
'WITH H_U_M_A_N G_R_O_W_T_H H_O_R_M_O_N_E DIETARY THERAPY !!!',
'Cc: 1/2 off valíum, xãnax. - Delivered Overnight',
'Welcome to our F.r.e.e T.e.e.n M.o.vie.s Newsletter Issue # 9',
);

foreach $email (@emails) {
$email =~s/|\s|\.|_|\*|\/|\!//g;
$email =~ s|ç|c|;
$email =~ s|ó|o|;
$email =~ s|ã|a|;
$email =~ s|í|i|;
$email = lc($email);

#foreach $email (@emails) {
# $email =~ s|ç|c|;
# $email =~ s|ó|o|;
# $email =~ s|ã|a|;
# $email =~ s|í|i|;

print "$email\n";
print "—" x length ($email) . "\n";
foreach $spam_filter (@spam_filters) {
if ($email =~ m/$spam_filter/gi) {
print "SPAM : $spam_filter\n";
} else {
print "o.k. : $spam_filter\n";
}
}
print "\n";
}
 
Still I think you can clean it up using for instance my hash where the key is the character looked for and the value is the replacement. I have not yet delivered an example file for my idea:
çc
óo
ãa
íi
+
-

 
the only problem is once you get past the extra characters then you will have to impliment a spelling dictionary.

ie

            "viiagra",
               "parris hilton",
               "vicoddin",
               "teen movvies",
               "human growth horrmone",
               "xXxanax",
at home I have a list of allowed emails everything else goes right to the trash. At work because the nature of the biz I just have to suffer through it all.
 
Could also be fixed by using a delimeter in the file:
ç#c
ó#o
ã#a
í#i
+#
-#
ii#i
etc.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top