Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

data extraction 1

Status
Not open for further replies.

rjseals

Technical User
Nov 25, 2002
63
US
Hi I am a perl newbie trying to write a script that will go through an array of baskeball teams getting the season points per game average of players on that team if they average greater than 5 points per game.

Here is what I have so far (very limited)
Code:
#!C:\perl

use LWP::Simple;
$x=3;
my @teams = (68, 99, 103, 151);
while ($x>=0)
{
$data=get("[URL unfurl="true"]http://sports.espn.go.com/ncb/teamstats?teamId=$teams[/URL][$x]&sort=avgPoints");
 if ($data =~ m|<title>ESPN.com: ([^<]*)Teamstats</title>|s) 
 {
	$teamname=$1;
	print "$teamname\n";
 } 
$x--;
}
This very simply prints out the team name. The main data I want to get however is from this line. (this is one player from about 13, I only pasted one from the code to keep the post as short as possible).

<tr class="oddrow" align=right><TD align =left>
<a href="/ncb/player/profile?playerId=11225">Craig Smith</a></TD><TD>31</TD><TD>35.4</TD><TD bgcolor=#C1C1C1>17.1</TD><TD>8.7</TD><TD>2.7</TD>
<TD>2.3</TD><TD>1.2/1</TD><TD>1.2</TD><TD>0.8</TD><TD>3.0</TD><TD>.578</TD><TD>.655</TD><TD>.091</TD><td>1.55</TD></tr><tr class="evenrow" align=right>


So basically I need the name and the points per game (which is the td with th bgcolor of #C1C1C1) for each player (if the ppg is greater than 5.

Can someone point me in the right direction?

Thanks
 
Here is one way to do it:

Code:
use LWP::Simple;
my $q = new CGI;
print $q->header,'<plaintext>';
my @teams = (68, 99, 103, 151);
for my $n (@teams) {
   my $data = get("[URL unfurl="true"]http://sports.espn.go.com/ncb/teamstats?teamId=$n&sort=avgPoints");[/URL]
   my @data = split(/\n/,$data);
   foreach my $line (@data) {
      if ($line =~ m|^\s*<title>ESPN.com: (.+) Teamstats</title>|i) {
         my $teamname=$1;
         print "$teamname\n";
      }
      elsif ($line =~ m/^\s*<!--teamId=$n-->/) {
         $line =~ s/<[^>]+>/~/g;
         my @line_data = split(/~+/,$line);
         shift(@line_data);
         for (my $i=0;$i<$#line_data;$i+=15) {
            print "  $line_data[$i] - $line_data[$i+3]\n" if ($data[$i+3] > 5);
         }
         last;
      }
   }
   print "\n";
}

it's up to you to adjust the output displayed to your requirements. I used <plaintext> just for test purposes. If you don't want the team averages displayed try changing:

$i<$#line_data

to:

$i<$#line_data-15
 
Thanks for your response. When I run the script I get the following error.

Can't locate object method "new" via package "CGI" (perhaps you forgot to load "CGI" ?

Is this a library I need to install?

Thanks again for you time and help.
 
oh sorry, yes you have to load CGI if you use my code example:

Code:
[b]use CGI;[/b]
use LWP::Simple;
my $q = new CGI;
print $q->header,'<plaintext>';
my @teams = (68, 99, 103, 151);
for my $n (@teams) {
   my $data = get("[URL unfurl="true"]http://sports.espn.go.com/ncb/teamstats?teamId=$n&sort=avgPoints");[/URL]
   my @data = split(/\n/,$data);
   foreach my $line (@data) {
      if ($line =~ m|^\s*<title>ESPN.com: (.+) Teamstats</title>|i) {
         my $teamname=$1;
         print "$teamname\n";
      }
      elsif ($line =~ m/^\s*<!--teamId=$n-->/) {
         $line =~ s/<[^>]+>/~/g;
         my @line_data = split(/~+/,$line);
         shift(@line_data);
         for (my $i=0;$i<$#line_data;$i+=15) {
            print "  $line_data[$i] - $line_data[$i+3]\n" if ($data[$i+3] > 5);
         }
         last;
      }
   }
   print "\n";
}
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top