Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to download particular html site content

Status
Not open for further replies.

stebel

Programmer
Apr 20, 2005
4
PL
Hi

I need to download html site source via perl script.

What I've up to now is such a code

print "Content-type: text/html\n\n";
use IO::Socket;
$sock = IO::Socket::INET->new(PeerAddr => '217.xxx.yyy.zzz',
PeerPort => 'http(80)',
Proto => 'tcp');
die "$!" unless $sock;
$sock->autoflush();
print $sock "GET / HTTP/1.0" . "\015\012" x2;
$document = join('', <$sock>);
print "$document\n";

However it downloads only the major html site - let's say on server 217.xxx.yyy.zzz i have file01.html file02.html.

My question is how to download file02.html content ?
If I put $sock = IO::Socket::INET->new(PeerAddr => '217.xxx.yyy.zzz/file02.html',
PeerPort => 'http(80)',
Proto => 'tcp');

it does not work and that's nothing suprising about that as I must specify host, right ? But how can I specify particular file ?

Thanks for any suggestions
 
OK, never mind

print $sock "GET /file02.html HTTP/1.0" . "\015\012" x2;

 
Any of wget (a free command-line tool), or LWP would save you a lot of effort here.

Wouldn't you rather
Code:
use LWP::Simple;
my $doc = get '[URL unfurl="true"]http://217.xxx.yyy.zzz/file02.html';[/URL]
print $doc, "\n";

Yours,

fish

[&quot;]As soon as we started programming, we found to our surprise that it wasn't as easy to get programs right as we had thought. Debugging had to be discovered. I can remember the exact instant when I realized that a large part of my life from then on was going to be spent in finding mistakes in my own programs.[&quot;]
--Maur
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top