Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

warning - I'm using profanity with regular expressions 2

Status
Not open for further replies.

jimoblak

Instructor
Oct 23, 2001
3,620
US
I'm working on a profanity filter and have a problem replacing the following variations of hell:

$pattern = array ("/ hell /i","/ hell./i","/hell,/i");
$replace = "****";
$output=preg_replace($pattern, $replace, $input);

The problem is that 'shell,' would be filtered to 's****' and 'hell' (if used at the start of the input string) would not be filtered.

Is there a way to protect this and other words like 'shittah' (a.k.a. Acacia tree wood), 'associate', and 'hello' from being unnecessarily filtered?
 
you could do it as you've done above...
/ hell/
/ hell./
/ hell,/
(you'd forgotten that last space)

OR
you could add in the word boundary part of the regex

/\bhell\b/

Though it looks to me like what you really want is more like
/\bhell[ ,.]/i

by the way, there's no reason to setup a separate pattern for each word, put the words in an array, and create the regular expression from that array, determine the number of astericks by using strlen... and you have your whole filter in one tidy loop.

-Rob
 
[Note: this post uses scatological language. It is done so in an illustrative manner.]

There is really no way to create a regular express which will allow "shittah", but block "shithead". There are just too many variants on slang, spelling, etc.

I would filter whole words. This means, however, that during your filter's first weeks online that you will be having to constantly be adding bad words to your dictionary. And you'll have to remember to add variants, too. My example bad word from the previous paragraph can also, I suppose be spelled with a hyphen.

By filtering whole words, an entry in your dictionary will look something like "/ hell /i". Want the best answers? Ask the best questions: TANSTAAFL!
 
Correct me if I'm wrong, but wouldn't
/ hell /i

miss (obviously) words with ,'s or .'s or ;'s at the end... simply remedied by using [ ,.;] instead of the space at the end... but more importantly, and more difficultly remedied, miss words that start a line? (hence a good place to use /b)

-Rob
 
A star for sleipnir214's nice vocabulary (scatological) and one for skiflyer: '\b' will help in addition to building an exception list.

I was trying to avoid an extensive dictionary list but it seems inevitable.
 
Does anyone know where to find a consequent dictionary with profanities gathered by user input?

Thanks!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top