Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

parsing a page

Status
Not open for further replies.

spydermonkey

Programmer
May 24, 2004
31
US
This is a repost of the CGI question, but no one's answered in a few days so I'm here to see if anyone here can help.

This IS reinventing the wheel but it's a learning process as this is my first attempted module. It mimics LWP in the way you use meta_gather($url) to extract the source code, but specifically just the meta tags.

This worked FINE before I added all the IF tests when I used $count like $array[$count] = "$1::$2";. But after I added all the tests to insert things in the proper order (as outlined with the # numbers) no results are displayed (just a bunch of new lines).

Can someone help me figure out the bug? PerlGoodies is a temp name btw, lol, it will be changed later.

ALSO, how do I export variables back into the script calling it? The @meta_results won't be printed in the module, it's for testing, but I need to pass it to the script somehow so the user can do whatever they want with this array. Can someone help me out with that?


Code:
package PerlGoodies;
use Exporter;
@ISA = 'Exporter';
@EXPORT_OK = qw(meta_gather);
use strict;


sub meta_gather($)
{

use LWP::Simple;
require HTTP::Status;


my($url) = @_;

my $p_content = get($url);

my @meta_results;
my $count = 0;


#1 = description
#2 = keywords
#3 = abstract
#4 = author
#5 = robots
#6 = distribution
#7 = language
#8 = rating
#9 = copyright
#10 = distributor

  while($p_content =~  /<meta\s+name=\"(.+?)\"\s+content=\"(.+?)\">/ig) 
 {
      $count++;


      if($1 =~ /^description$/i)
      {
         $meta_results[1] = "$2";
      }
      elsif($1 =~ /^keywords$/i)
      {
         $meta_results[2] = "$2";
      }
      elsif($1 =~ /^abstract$/i)
      {
         $meta_results[3] = "$2";
      }
      elsif($1 =~ /^author$/i)
      {
         $meta_results[4] = "$2";
      }
      elsif($1 =~ /^robots$/i)
      {
         $meta_results[5] = "$2";
      }
      elsif($1 =~ /^distribution$/i)
      {
         $meta_results[6] = "$2";
      }
      elsif($1 =~ /^language$/i)
      {
         $meta_results[7] = "$2";
      }
      elsif($1 =~ /^rating$/i)
      {
         $meta_results[8] = "$2";
      }
      elsif($1 =~ /^copyright$/i)
      {
         $meta_results[9] = "$2";
      }
      elsif($1 =~ /^distributor$/i)
      {
         $meta_results[10] = "$2";
      }
  }

foreach (@meta_results) { print "$_\n";}

return;
}

1;

__END__
 
The problem is you are using $1 in every if/elsif statement.

$1 is a special variable which gets the subpattern from the last pattern match. So as soon as it evaluates "if($1 =~ /^description$/i)", $1 gets blown away, because "$1 =~ /^description$/i" is another pattern match!

Try adding this line as the first line inside the "while" loop:

Code:
my ($name, $content) = ($1, $2);

Then use $name instead of $1, and $content instead of $2, in each of the if/elsif statements.
 
Also, I recommend using a hash instead of an array, and consolidating all the if/elsif statements using a look-up table. Something like this:

Code:
package PerlGoodies;
use Exporter;
@ISA = 'Exporter';
@EXPORT_OK = qw(meta_gather);
use strict;


sub meta_gather($)
{

  use LWP::Simple;
  require HTTP::Status;


  my($url) = @_;

  my $p_content = get($url);

  my %meta_results;
  my $count = 0;
  my @metas = qw(description keywords abstract
                 author robots distribution
                 language rating copyright distributor);

  while($p_content =~  /<meta\s+name=\"(.+?)\"\s+content=\"(.+?)\">/ig) 
  {
      my ($name, $content) = ($1, $2);
      $count++;

      foreach my $meta (@metas) {
          if ($name eq $meta) {
              $meta_results{$meta} = $content;
          }
      }
  }

  foreach my $meta (keys %meta_results) {
    print "$meta = $meta_results{$meta}\n";
  }

  return;
}

1;

__END__
 
Thanks for your help, that did the trick. Making new variables instead of $1 and $2. I don't get how assigning variables to $1 makes it any different than using $1 itself if for each line the variable gets over written anyway, but it works!

I was told to use a hash but this is a module and I thought an array would be nicer. That way, you could always call $array[1] for the description.

Either of you know how to "simply" export this variable so the person running the program can use @meta_results in the script to do whatever it is they want?

Thanks!
 
Your return line should be:

Code:
return @meta_results;

But the users will have to call the module's subroutine in such a way that they're expecting a return result.

- Rieekan
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top