Okay, first, an exercise to know what we're dealing with. Open a cookies-enabled browser. Go to this URL:
This should pop up an authentication dialogue box. Use the username: numbski and password: slipup. There's nothing here to protect, it just keeps track of what I've downloaded when, no big deal.
This will bring back a page telling you that you're successfully logged in. As you get this page you're assigned a cookie, and a META-REFRESH tag takes you back to the main page.
Now, go to this page:
Notice the upper-right hand corner identifies you as 'numbski' (it remembered who you are) and at the bottom of the page there are two links. Yes I do own it, and Oops, I am sorry. Look at the source. The 'yes' link is some randomly generated key akin to this:
?1005558755|d1cbe3583b4477c207af7c85d0e75104
Alright, tag that on to the end of your current URL like so:
This is our final destination. It has a META-Refresh tag that will try to automatically download the file. Stop that and look at the page. There's a line like this:
download: if your download doesnt start automatically, click here
Taking a look at the source of this link, if all has gone well, will look something like this:
Okay, I apologize, that was a mouthful. There was no getting around it though.
Now, onto the script. I have written an LWP::Simple script that goes through that exact process, and is supposed to download the zipfile at the end. It worked great for about 2 weeks, then they added the authentication thing and required a cookie. That's where my nightmares began. It looked like it would be a simple thing to fix, add the authentication, switch to LWP::UserAgent, and use cookies, but something is going horribly wrong. Instead of pulling the second download page like it's supposed two, it keeps pulling the first page over and over again. I have no idea why. If you add "|login=confirmed" to the end of the second URL, it will pull the second page correctly, but instead of the format of the URL you're supposed to get as above, you get something like this:
If you attempt to download that file, it'll download an HTML document stating 'you were linked by a bandwidth thief'.
Basically, I just need someone to take a look over my script and see if I'm screwing something up. I have debug enabled so you can see what's going on in the http headers each step of the way. Helllp!
Begin code:
This should pop up an authentication dialogue box. Use the username: numbski and password: slipup. There's nothing here to protect, it just keeps track of what I've downloaded when, no big deal.
This will bring back a page telling you that you're successfully logged in. As you get this page you're assigned a cookie, and a META-REFRESH tag takes you back to the main page.
Now, go to this page:
Notice the upper-right hand corner identifies you as 'numbski' (it remembered who you are) and at the bottom of the page there are two links. Yes I do own it, and Oops, I am sorry. Look at the source. The 'yes' link is some randomly generated key akin to this:
?1005558755|d1cbe3583b4477c207af7c85d0e75104
Alright, tag that on to the end of your current URL like so:
This is our final destination. It has a META-Refresh tag that will try to automatically download the file. Stop that and look at the page. There's a line like this:
download: if your download doesnt start automatically, click here
Taking a look at the source of this link, if all has gone well, will look something like this:
Okay, I apologize, that was a mouthful. There was no getting around it though.
Now, onto the script. I have written an LWP::Simple script that goes through that exact process, and is supposed to download the zipfile at the end. It worked great for about 2 weeks, then they added the authentication thing and required a cookie. That's where my nightmares began. It looked like it would be a simple thing to fix, add the authentication, switch to LWP::UserAgent, and use cookies, but something is going horribly wrong. Instead of pulling the second download page like it's supposed two, it keeps pulling the first page over and over again. I have no idea why. If you add "|login=confirmed" to the end of the second URL, it will pull the second page correctly, but instead of the format of the URL you're supposed to get as above, you get something like this:
If you attempt to download that file, it'll download an HTML document stating 'you were linked by a bandwidth thief'.
Basically, I just need someone to take a look over my script and see if I'm screwing something up. I have debug enabled so you can see what's going on in the http headers each step of the way. Helllp!
Begin code:
Code:
#!/usr/bin/perl
use Cwd;
use LWP::UserAgent;
use URI;
use URI::URL;
use HTML::Parse;
use HTML::Element;
use HTTP::Cookies;
use HTTP::Request;
use HTTP::Response;
use LWP::Debug '+';
system("cls");
##
#For now, I've disabled the actual login routines. Made the login a global.
#print "Username: ";
#$username=<>;
#print "Password: ";
#$password=<>;
#chomp($username);
#chomp($password);
$username="numbski";
$password="slipup";
print "Checking Authentication...\n\n";
$ua = new LWP::UserAgent;
#Hopefully creates a cookie jar that will catch mame.dk's cookie.
$cookies = HTTP::Cookies->new; # Create a cookie jar
$ua->cookie_jar($cookies); # Enable cookies
#Tell the site that we're IE5.5 on Windows 2000
$ua->agent("Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)");
#Request the login page, give it our username and password.
$req = new HTTP::Request GET => '[URL unfurl="true"]http://www.mame.dk/login.phtml';[/URL]
$req->authorization_basic($username,$password);
my $response = $ua->request($req);
if ($response->is_success){
$login_page = $response->content;
print "Login successful for $username.";
}
else{
die "Could not get login page content.";
}
print "\n\nPress Enter to Continue...\n";
$enter=<>;
system("cls");
#Proceed to downloading the first page. 'Do you own this ROM?'
print "Downloading Page 1...\n\n";
$rom="pacman";
$page1 = new HTTP::Request GET => "[URL unfurl="true"]http://www.mame.dk/download/rom/$rom/";[/URL]
#Don't think this is needed after I'm logged in and have a cookie.
#$page1->authorization_basic($username,$password);
$res = $ua->request($page1);
if ($res->is_success){
$page1_source = $res->content;
print "Got page 1.\n\n";
}
else{
die "Could not get page content.";
}
#Find the links in this page, strip out the link to page 2.
$parsed_page1 = HTML::Parse::parse_html($page1_source);
for (@{ $parsed_page1->extract_links() }) {
$link=$_->[0];
$url = new URI::URL $link;
$full_url = $url->abs("[URL unfurl="true"]http://www.mame.dk/download/rom/$rom");[/URL]
#Look for the URL with a question mark in it. That's the one we need.
if($full_url=~/\?/){
$page2_url=$full_url;
chomp($page2_url);
#Since the structure of the URL is weird, we need to split it
#and add back in the rom name.
@url_parts=split(/\?/,$page2_url);
#@session_id=split(/\%7C/,@url_parts[1]);
$page2_url="@url_parts[0]$rom?@url_parts[1]";
print "\nHere's the link I found!\n $page2_url\n";
&get_page2;
}
}
sub get_page2{
$enter=<>;
system("cls");
#'Download pacman.zip' page....we hope anyway.
$page2 = HTTP::Request->new ( GET => $page2_url);
$page2->authorization_basic($username,$password);
$res2 = $ua->request($page2);
if ($res2->is_success){
$page2_source = $res2->content;
print "Page 2 Complete.\n";
print "Have a source looksie:\n\n";
print $page2_source;
print "\n\nIf all looks well here, try downloading the zipfile.\n";
print "Press ENTER to continue.\n";
$enter=<>;
&download_zip;
}
else{
die "Could not get page2 content";
}
}
sub download_zip{
@page2_content=split(/\n/,$page2_source);
foreach $line(@page2_content){
@line_parts=split(/</,$line);
#The following is a very poor parsing routine. Will be replaced later.
#It's effective for our purposes though.
#Begin stripping the binary link out of the correct line.
if(@line_parts[2] eq "TD valign=\"top\" class=\"stdtext\">if your download doesnt start automatically, "){
print "Found our link.\n";
@link_parts=split(/<a href=/,$line);
$binary_link=@link_parts[1];
chomp($binary_link);
$binary_link=~s*"**g;
$binary_link=~s*>**g;
$binary_link=~s*click here</a**g;
#This MUST be in the form [URL unfurl="true"]http://roms(2).mame.dk/randomchars/randomchars/cur/$rom.zip[/URL]
#I've been getting [URL unfurl="true"]http://roms(2).mame.dk/$rom.zip,[/URL] try it in IE or Netscape to see what I mean.
print "Link is $binary_link.\n";
print "Downloading $rom from [URL unfurl="true"]http://roms.mame.dk\n";[/URL]
$rom_filename="$rom.zip";
#Create our binary request. Print out failures, if any.
#This should save the zipfile to the same directory as this script.
my $zipfile = new HTTP::Request('GET', "$binary_link");
my $response = $ua->request($zipfile, "$rom_filename");
if($response->is_error()){
print $response->status_line."\n\n"
}
else{
print "Download of $rom.zip complete.\n\n"
}
}
}
print "If you saw no text after downloading page 2, then it failed to get\n";
print "the correct page 2.";
}