Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help reading web page via Perl 1

Status
Not open for further replies.

markmorgan

Programmer
May 13, 2002
19
US
I am a Perl novice and am trying to get the following script to work

Code:
#
use strict;
use warnings;
use LWP::Simple qw(get);

my $url = "[URL unfurl="true"]http://www.a-valid-url.html";[/URL]

my $page = get($url);
exit unless $page;

The get function appears to do something as it takes time to execute, but the result in $page is <undef> triggering the exit.

I am using the Open Perl IDE and Perl under Windows 2000. Perl is returning "Unknown Error" from the get I think.

If I pass in an invalid URL it returns "Invalid Argument", so its not the url.

Mark.
 
If you want to see what the actual get function is getting try printing it out:

Code:
#
use strict;
use warnings;
use LWP::Simple qw(get);

my $url = "[URL unfurl="true"]http://www.hostpipe.co.uk";[/URL]

my $page = get($url);

print $page;

# exit unless $page;

I'm not quite sure where you got the exit unless $page from, the line
Code:
my $page = get($url);
automatically grabs the source from your requested url and assigns it to the $page variable, you can then use this as you wish.

I hope this is of some help, any other problems then just ask.

Also at the beginning of your perl scripts, make sure you assign the perl path, for example mine in a windows environment is simply #!perl for unix it might be along the lines of #!/bin/perl/ etc, I'm not sure of the exact notation but it can make a difference in more complicated codes and on different server setups.
 
The print statement gives "Use of uninitialized value in print" as the $page variable hasn't been assigned a value because the get failed.

The script I am trying to get working comes from here
Mark.
 
What's the URL you are trying to get?

Check out HTTP::SimpleLinkChecker on CPAN, it checks the status of a url through perl, eg

Code:
use LWP::Simple;
use HTTP::SimpleLinkChecker;

$src="URLGOESHERE";

my $code = HTTP::SimpleLinkChecker::check_link($src);
print "Response Code: ".$code."\nStatus: ".status_message($code)."\n";
if ($code == 200)  {
	print "No Error\n";
}  else  {
  	print "Error: $HTTP::SimpleLinkChecker::ERROR\n";
}

I can only imagine the get is failing if the response code is not 200.
 
To add I would also consider using an if/else statement or a die rather than exit. Maybe:

Code:
my $url = "URLGOESHERE";

if(get($url))  {
  print "URL RETRIEVED\n";
}  else  {
  die "URL NOT RETRIEVED\n";
}
 
I personally like waiterm's idea, just exitting doesn't help you figure anything out-- atleast not in trial stages. Use an If/Else to determine when something works and when it doesn't, then do more testing to see why.

The question was asked already, but what URL are you tring to snatch? If it's Google, good luck with that it's not going to be this easy. Some sites do not allow you to get their source code, some require you to build your own user agent so it doesn't look like a bot.

I love LWP::Simple, but nothing works all the time. Can you tell us the URL that you're trying to get?
 
If you want to recieve some kind of error when a part of your script fails you can use $!, for example:
Code:
if...  {
  die "This died because $!";
}
the last error message logged is stored in the $! variable, so if the get fails then it will create an error in here if the if{} is straight after the get.
 
I ran waiterm's idea of the SimpleLinkChecker above, here are the results


LWP::UserAgent::request: ()
LWP::UserAgent::simple_request: HEAD LWP::UserAgent::_need_proxy: Not proxied
LWP::protocol::http::request: ()
LWP::UserAgent::request: Simple response: Internal Server Error
LWP::UserAgent::request: ()
LWP::UserAgent::simple_request: GET LWP::UserAgent::_need_proxy: Not proxied
LWP::protocol::http::request: ()
LWP::UserAgent::request: Simple response: Internal Server Error
Response Code: 500
Status: Internal Server Error


Could it be a proxy server problem? This site uses a proxy server to access the internet. Any idea how to get Perl to use it?

Mark.
 
I don't know enough about proxies to be able to suggest a fix, however there is a proxy module on CPAN HTTP::proxy try this and see if it's any good, I think it's a standard PPM installer so you shouldn't have any trouble getting it. I'm a bit dubious about it being a proxy problem as the line:
Code:
LWP::UserAgent::_need_proxy: Not proxied
would suggest, is not proxied!!

Could you also snippet your code in this forum as well, it might help give us an idea if there are any bugs that you might have missed.

Rob
rob@hostpipe.co.uk
 
After a bit of digging I found that LWP can use an environment variable for the proxy server

Code:
http_proxy=[URL unfurl="true"]http://www.myproxyserver.com[/URL]

I set the environment variable and added these lines to my code before the get

Code:
use LWP::UserAgent;
my $ua = new LWP::UserAgent;
$ua->env_proxy;

This allowed the get to use the proxy server. So, the get now works. Unfortunately the SimpleLinkChecker still flashes up the _need_proxy: Not proxied message.

Mark.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top