Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Web links

Status
Not open for further replies.
a fairly simple way is to use LWP::Simple. This
does not give you much control.
Code:
#!/usr/local/bin/perl
use LWP::Simple;
open(HTMLPAGE,&quot;</httpd/htdocs/index.html&quot;) or die &quot;$!\n&quot;;
while (<HTMLPAGE>) { $buf .= $_; }
close HTMLPAGE;

while ($buf =~ /<a href=&quot;(.*?)&quot;>/gis)  {
  my $link = $1;
  $content = get($link);
  # peruse the content as needed
  }

This method uses the HTTP and LWP modules and
gives a little more control, like a timeout.

Code:
#!/usr/local/bin/perl
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;

my $browser = LWP::UserAgent->new();
$browser->timeout(5);

open(HTMLPAGE,&quot;</path/to/your/file.html&quot;) or die &quot;$!\n&quot;;
while (<HTMLPAGE>) { $buf .= $_; }
close HTMLPAGE;

while ($buf =~ /<a href=&quot;(.*?)&quot;>/gis)  {
  my $link = $1;
  print &quot;Checking $link.\n&quot;;
  my $request = HTTP::Request->new(GET => $link);
  my $response = $browser->request($request);
  if ($response->is_error()) 
     { printf &quot;%s\n&quot;, $response->status_line; }
  $contents = $response->content(); 
  
  # check the link content as you like
  # you may want to check differently from this
  if ($contents =~ /Not Found|error|sorry|
				redirect|autoforward|frameset/ix)
	 { print &quot;Bad Link, $&\n$link.\n\n&quot;; }
}
'hope this helps

If you are new to Tek-Tips, please use descriptive titles, check the FAQs, and beware the evil typo.
 
I ran the second web link script and got the following error:


C:\Perl\bin>webCh.pl
Checking Can't locate object method &quot;new&quot; via package &quot;http::Request&quot; at C:\Perl\bin\webC
h.pl line 19..



I checked and made sure I have the modules in the script:[tt]
use LWP::UserAgent;
use http::Request;
use http::Response;[/tt]
 
Sorry but I typed in lowercase http instead of upper case and now the script works!

Can you please (if anyone has the time) explain how it works?? I am really lost on the perl functions?
Also why do do some have => and some ->


[tt]


my $browser = LWP::UserAgent->new();#WHAT IS new()
$browser->timeout(5);#WHAT IS timeout(5)

open(HTMLPAGE,&quot;</path/to/your/file.html&quot;) or die &quot;$!\n&quot;;
while (<HTMLPAGE>) { $buf .= $_; }
close HTMLPAGE;

while ($buf =~ /<a href=&quot;(.*?)&quot;>/gis) {
my $link = $1;
print &quot;Checking $link.\n&quot;;
#WHAT IS THE new(GET => $link doing..is it fetching each link???
my $request = http::Request->new(GET => $link);
my $response = $browser->request($request); #WHAT IS THIS WHOLE LINE DOING??
if ($response->is_error())
{ printf &quot;%s\n&quot;, $response->status_line; } #WHERE AND WHAT IS status line??
$contents = $response->content(); Please explain this whole line?

# check the link content as you like
# you may want to check differently from this
if ($contents =~ /Not Found|error|sorry| redirect|autoforward|frameset/ix)
{ print &quot;Bad Link, $&\n$link.\n\n&quot;; }
}[/tt]


Finally: Is it possible to run this once a day (like a cron job) except this is on an NT server???
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top