Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

trying to open a file & count the instances of a word inside it 1

Status
Not open for further replies.

Mizugori

Programmer
Joined
May 21, 2007
Messages
56
Location
US
I thought I understood this but my code is not working, and I am not sure why. I am trying to learn Perl on my own but it is a little frustrating, as there is not an abundance of clear, step-by-step material like with some other languages.

For this small script I wanted to be able to open a text file, and count how many times a word is found in that text file. This started as a kind of joke with a friend of mine, who types waaaay too many acronyms in his emails (lol wtf gtfo brb lmao etc)

So, my program was first written to search for lol, or LOL, and I copied one of his emails into a .txt file. It did not work so I tried something simpler; I made a small .txt file and typed a few randomish words and tried to count how many times 'the' appeared in the file. I got it to work, but it counts 4 when there are in fact 5. Can anyone please explain to me why it is getting the wrong number??

Thanks!! here is my code, followed by the text file I am using it with now:

Code:
#!/usr/bin/perl

use strict;

print "Enter the name of your file, ie myfile.txt:\n";

my $val = <STDIN>;
chomp ($val);

my $cnt=0;

open (HNDL, "$val") || die "wrong filename";

while ($val = <HNDL>)
{
  if ($val =~ /the/i)
  {
        print $val;
        $cnt++;
  }
}

print "Number of instances of 'the' found: $cnt\n\n";

close (HNDL);

Code:
the this the
a the- this is the

the then thee
thine thou thoust then
 
The reason you're getting 4 is because you're not counting the number of occurrences on each line. $cnt increases by 1 if "the" is found anywhere on the line, regardless of how many of them there are. Each of your 4 lines has "the" in it, so that's why your count is 4. For the fourth line, your regexp is matching the first three characters of "then", since you're not checking for word boundaries.
 
Well, all you're actually doing is counting the 'number of lines' that contain the string 'the', which is 4.

If you wanted to count the number of occurences of 'the', you'd actually have 8.

If you want your 5...
Code:
#!/usr/bin/perl

use strict;

my $cnt = 0;
while (<DATA>)
{
  while (/\bthe\b/ig)
  {
        print;
        print "\n";
        $cnt++;
  }
}

print "Number of instances of 'the' found: $cnt\n\n";


__DATA__
the this the
a the- this is the

the then thee
thine thou thoust then

The /g (global) modifier means that the regexp will continue looking through the string until it can't find another match.

The \b parts of the regexp denote word boundaries, so that 'then' and 'thee' are not matched.
 
thanks brigmar but one problem, i need to open a text file i don't want to have the data hard coded into the script...

i changed my code to the following and now it says 3 instances found... ?

Code:
#!/usr/bin/perl

use strict;

print "Enter the name of your file, ie myfile.txt:\n";

my $val = <STDIN>;
chomp ($val);

my $cnt=0;

open (HNDL, "$val") || die "wrong filename";

while ($val = <HNDL>)
{
  if ($val =~ /\bthe\b/ig)
  {
        print $val;
        print "\n";
        $cnt++;
  }
}

print "Number of instances of 'the' found: $cnt\n\n";

close (HNDL);
 
The __DATA__ was just for test purposes, given that was the content of your text file.

Again, you're using if() when testing the regular expression, meaning it will only check the once.

You need to make that a while() like I did.
 
THANKS!!! works great now!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top