Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Save HTML files back

Status
Not open for further replies.

rubis

Programmer
Jun 21, 2001
54
GB
I want to save HTML files of Web pages back to the local machine.

First of all, I wrote this code

use LWP::UserAgent;
use HTTP::Request;
use HTTP::Response;

$ua = LWP::UserAgent->new;
$res = HTTP::Resquest->new('GET', "$rep = $ua->request($res);

This code gives me "501 (Not Implemented) Protocol scheme 'http' is not supported". So I search on the Net and got this information.

(lib-------------

If you want to access sites using the https protocol, then you need to install the Crypt::SSLeay or the IO::Socket::SSL module. The README.SSL file will tell you more about how lib supports SSL.

I told this to the admin but he said ...

"You are NOT trying to access using the https protocol, so this is irrelevant. It is fine if you call it really simply, it is only when you try to use the object-oriented form that it fails." The he gave me the following code.

#!/usr/bin/perl

use strict;
use LWP::Simple;

my $doc = get "
print "Content-type:text/html\n\n";
print $doc;

From this, I've got some problems.

1) I don't know whether saving a HTML file back has anything to do with HTTP protocol or not.

2) his code will work with only the homepage located on the Web server of my work, not outside. It can't download any outside page. So I have two assumptions
- proxy
- HTTP protocol
If it's because of proxy, how can I set the proxy for this program. Also, my work uses the autoconfiguration (.pac).

3) if that web page contains Frame, the program won't get the right
web page. The page it gets will say "this page uses frame but your browser
doesn't support them". The result shown is in the correct format (frame style) but inside each page, it will say "404 page not found". So I think the program doesn't load the HTML files inside the frame but it downloads only the frame code.

My assumption for this problem is as my program isn't a browser so when it requests the web page from a remote web server, the server can detect that it's not a browser that supports frame. Therefore, it gives that page back to my machine instead. However, I don't know whether it's the right assumption. If so, how to make it get all HTML files within the frame??

Thanks you all.
 
Yauncin,

Thanks for you suggestion but I can't use it. The admin doesn't allow anybody to use it.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top