I want to save HTML files of Web pages back to the local machine.
First of all, I wrote this code
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
$ua = LWP::UserAgent->new;
$res = HTTP::Resquest->new('GET', "$rep = $ua->request($res);
This code gives me "501 (Not Implemented) Protocol scheme 'http' is not supported". So I search on the Net and got this information.
(lib-------------
If you want to access sites using the https protocol, then you need to install the Crypt::SSLeay or the IO::Socket::SSL module. The README.SSL file will tell you more about how lib supports SSL.
I told this to the admin but he said ...
"You are NOT trying to access using the https protocol, so this is irrelevant. It is fine if you call it really simply, it is only when you try to use the object-oriented form that it fails." The he gave me the following code.
#!/usr/bin/perl
use strict;
use LWP::Simple;
my $doc = get "
print "Content-type:text/html\n\n";
print $doc;
From this, I've got some problems.
1) I don't know whether saving a HTML file back has anything to do with HTTP protocol or not.
2) his code will work with only the homepage located on the Web server of my work, not outside. It can't download any outside page. So I have two assumptions
- proxy
- HTTP protocol
If it's because of proxy, how can I set the proxy for this program. Also, my work uses the autoconfiguration (.pac).
3) if that web page contains Frame, the program won't get the right
web page. The page it gets will say "this page uses frame but your browser
doesn't support them". The result shown is in the correct format (frame style) but inside each page, it will say "404 page not found". So I think the program doesn't load the HTML files inside the frame but it downloads only the frame code.
My assumption for this problem is as my program isn't a browser so when it requests the web page from a remote web server, the server can detect that it's not a browser that supports frame. Therefore, it gives that page back to my machine instead. However, I don't know whether it's the right assumption. If so, how to make it get all HTML files within the frame??
Thanks you all.
First of all, I wrote this code
use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;
$ua = LWP::UserAgent->new;
$res = HTTP::Resquest->new('GET', "$rep = $ua->request($res);
This code gives me "501 (Not Implemented) Protocol scheme 'http' is not supported". So I search on the Net and got this information.
(lib-------------
If you want to access sites using the https protocol, then you need to install the Crypt::SSLeay or the IO::Socket::SSL module. The README.SSL file will tell you more about how lib supports SSL.
I told this to the admin but he said ...
"You are NOT trying to access using the https protocol, so this is irrelevant. It is fine if you call it really simply, it is only when you try to use the object-oriented form that it fails." The he gave me the following code.
#!/usr/bin/perl
use strict;
use LWP::Simple;
my $doc = get "
print "Content-type:text/html\n\n";
print $doc;
From this, I've got some problems.
1) I don't know whether saving a HTML file back has anything to do with HTTP protocol or not.
2) his code will work with only the homepage located on the Web server of my work, not outside. It can't download any outside page. So I have two assumptions
- proxy
- HTTP protocol
If it's because of proxy, how can I set the proxy for this program. Also, my work uses the autoconfiguration (.pac).
3) if that web page contains Frame, the program won't get the right
web page. The page it gets will say "this page uses frame but your browser
doesn't support them". The result shown is in the correct format (frame style) but inside each page, it will say "404 page not found". So I think the program doesn't load the HTML files inside the frame but it downloads only the frame code.
My assumption for this problem is as my program isn't a browser so when it requests the web page from a remote web server, the server can detect that it's not a browser that supports frame. Therefore, it gives that page back to my machine instead. However, I don't know whether it's the right assumption. If so, how to make it get all HTML files within the frame??
Thanks you all.