Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Filtering bad words

Status
Not open for further replies.

c4n

Programmer
Mar 12, 2002
110
SI
I want to filter bad words from the input fields in a script. I came up with a code that works, but I'm sure one of you can give me a better solution (my code would probably take many server resources if the badword list is bad).

Thanks for any suggestions!

Perl code:
#!/usr/bin/perl

$input1 = "ASDkj aslkjflkaj asflkjlksaj BADWORD afasf";
$input2 = "sdgsdgj BADWORD2 petzi BADWORD afsht";
$input3 = "ljha BADWORD3 auzisf opuzip asf";

open (TXTFILE, "badwords.txt");

while (<TXTFILE>) {
chomp;
$input1 =~ s/\s$_\s/ /ig;
$input2 =~ s/\s$_\s/ /ig;
$input3 =~ s/\s$_\s/ /ig;
}
close(TXTFILE);

print &quot;Content-type: text/html\n\n&quot;;
print &quot;$input1<br>\n&quot;;
print &quot;$input2<br>\n&quot;;
print &quot;$input3\n&quot;;


File with bad words:
BADWORD
BADWORD1
BADWORD2
BADWORD3
etc...
 
dont know if this is what you want but....


open (TXTFILE, &quot;badwords.txt&quot;);
@badwords <TXTFILE>;
close(TXTFILE);

&bad_word_check($input1);

sub bad_word_check{
foreach $badword (@badwords){
chomp $badword;
if ($badword =~/$_[0]//gi){
#whatever with bad word
}
else{
#no bad wprds
}
}
}
1;
 
Hi, thanks for the post.

You think this method would use less resources (RAM, CPU,...)? I don't think RAM &quot;likes&quot; storing all the data from a file into an array:

open (TXTFILE, &quot;badwords.txt&quot;);
@badwords <TXTFILE>;
close(TXTFILE);

&quot;badword list is bad&quot; form my 1st post is a typo :) I meant &quot;badword list is long&quot;

I need as efficient/quick code as possible as I will be working with LONG badword lists and will have to check quite a lot of text (inputs).

Thank you.
 
My first thought would be to store the bad words in a hash; this would allow you to check a particular word against the bad-word-list very quickly.

At the start of your program you would read the bad words into the has from your text file.

Thereafter you can just check each word you need to against the hash.

This will work well except when your bad-word-list is too large to comfortably hold in memory with everything else. How large &quot;too large&quot; is will depend on how many other things your computer is doing at the time and on how much RAM your computer has.

There's no free lunch here basically, doing the check you require will either tie up disk resources, RAM resources or both (if you read a large list from a text file each time you need it).

Duncan's comment is apt - try a couple of methods and measure them.

Mike

Want to get great answers to your Tek-Tips questions? Have a look at faq219-2884

It's like this; even samurai have teddy bears, and even teddy bears get drunk.
 
Yes, looks like I will have to do just that - try various solutions and measure them. Either way I belive Perl is the best tool (in my knowledge base ;)) to do that.

Thanks
 
I know what your battling a forum or open posting system. You are wanting to program some morality.

you could use some client javascript to check a (smaller) hot list to reduce some of the overhead. I looked at our bad word list of 324 words. Which in my mind is quite piddlely.
To help train your uses kick the post if there are words you dont want and send'em a error message. the bad part is f*cking sh*t. It is very hard to grep all possible senerios.

We ended up purchasing a system that allowed the users to flag bad posts and 3 strikes you out type of deal.

thats my 2 cents worth
 
Hi Arcnon,

Right, it will be an open posting system (submitting articles and comments on them).

Thanks for the &quot;Red flag this post&quot; idea, how come I didn't think of it (It's right under every post here :)

Regards
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top