Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

newbie wants to count nbr of substr in a text string. How? 2

Status
Not open for further replies.

JCHallgren

Technical User
Dec 17, 2004
47
US
I am VERY new to this! And have looked thru what i have avail here...not sure what I'm looking for...

I have a text string (comments from a guestbook) that i want to see how many times certain strings occur in it..
The key one is " as more than 2 will be a sign of a spam post...is there an EASY way to scan $usrcoments for various strings and get a count of how many matches I get?

My book mentions Grep, but not clue on how to use it...
I would like to ideally match if mixed case used also...
And having all other similar words in a list to use ("poker","drug",etc) would be GREAT!

THANKS!
 
This should get you started:
Code:
my $text = "guestbook comment";
my @list = ('[URL unfurl="true"]http://','poker',[/URL] 'drug');
foreach my $substr (@list) {
  my $cnt = $text =~ /($substr)/g;
  print "text contains $cnt occurrances of '$substr'\n";
}

Using a scalar to collect regex matches, returns the number of matches, rather than the list of matches.

Barbie
Leader of Birmingham Perl Mongers
 
Although you might be able to use map or grep, the following is easier to understand if you are new to perl.
Code:
use strict;
use warnings;

my @spamwords = ("poker", "[URL unfurl="true"]http://",[/URL] "drug", "enlargement", "viagra", "ad nauseum");
my $usercomments = "Make big money playing Internet Poker at [URL unfurl="true"]http://www.scumbag.com.[/URL] Genuine testimonial: 'I used to have a drug problem, but now I can afford it' (cannot be named for legal reasons), Essex";
my $spamindex = 0;

foreach my $word (@spamwords) {
    $spamindex += $usercomments =~ s/$word/$word/gi;
}

print "Spam rating is $spamindex: Bin it!\n" if $spamindex > 1;
The global case-insensitive search and replace s/$word/$word/gi returns the number of replacements made.
 
OK, so you don't need to replace it to get the count.
Code:
$spamindex += $usercomments =~ /$word/gi;
works just as well... (thanks Barbie).
 
That's some quality creative spamming there Steve. You've worked in the industry haven't you? ;)
 
No that's just an beginners attempt, you need far more random and transposed (eg 'v1agr4') strings in there ;-)

I work in the industry, but on the side of the good guys, at MessageLabs.

Barbie
Leader of Birmingham Perl Mongers
 
Stevexff and Barbie: That was just PERFECT! I like code that I can understand! And it seems to solve ALL the issues I have!!!
So far, all the junk has been using "normal" words...just a BUNCH of them so thats why the count would be good...one or two words may be normal chat, but 25 are not!
 
Update: I tried it in my lil test pgm...doesn't seem to work right :( Only appears to count the FIRST occurance of each bad word...
 
This works:
Code:
$| = 1;

$text = <<EOT;

Hello testing counting hello string.

Just to make sure count is correct, hEllo!

Hello, Hello, Hello!!!
EOT

$text2 = "Hello";

$cnt = $text =~ s/(\bhello\b)/$1/gi;

print STDOUT "$cnt\n";

$cnt += $text2 =~ s/(\bhello\b)/$1/gi;

print STDOUT "$cnt\n";

OUTPUT:
Code:
6
7

Matching does not return how many times it matches the string, but search and replace does.



Michael Libeson
 
In that case you may need to do the following:
Code:
my $text = $usercomments;
$spamindex += $text =~ s/$word//gi;
I've discovered that a global match doesn't work the way I expected. I have made a copy of the variable $usercomments in the event you want to keep the original.

Barbie
Leader of Birmingham Perl Mongers
 
Found via a IRC chat that adding in a " = () = " after $spamindex also works...thanks M Libeson also!
 
Awk
[tt]
BEGIN { FS=","
spamwords="poker, nauseum"
usercomments="Make big money playing Internet Poker at Genuine testimonial: 'I used to have a drug problem, but now I can afford drugs' (cannot be named for legal reasons), Essex"
$0 = spamwords
for (i=1; i<=NF; i++)
print $i, split(tolower(usercomments),junk,$i)-1
}
[/tt]
 
Code:
$spamindex += () = $usercomments =~ /($substr)/g;

Apparently the match only returns true or false. However, forcing into list context will return the right results.

Barbie
Leader of Birmingham Perl Mongers
 
Barbie, don't worry...the guy on PerlMonks I got aid from at 3am, gave same initial answer...at that time, I had not noticed a reply here...so tried there for a 2nd opinion..

But now we BOTH know the way to do so!
 
So that means I was right the first time, with the search & replace. I feel much better now...[smile]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top