Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

lwp and form structure question/problem

Status
Not open for further replies.

programingnoob

Technical User
Dec 31, 2002
71
CA
Hi,

I got thsi script on the net, and I don't know how to get this working, as I do not understand it at all, can someone please teach me what the script is doing?

[script]
## PERL modules that mimick the browser and the URL
require LWP;
require URI::URL;

use strict;
use CGI; ## PERL module to read HTML form values (not used in this version of the script)

my($hdr,$server_response);

## The URL location of the HTML form
my($statement_URL)="
## Create a new CGI object
my($query)=new CGI;

## Right now the values are hard-coded
$hdr="index=books&field-keywords=kampf&rank=+amzrank&Go.x=6&Go.y=6";

## "call" the HTML form with the above name=value pairs
$server_response=&browse($statement_URL,$hdr); ## Fire the URL

## Print the MIME header
print "Content-type: text/html\n\n";

## Print the output from the HTML form.
print "$server_response";


## Subroutine that does all the actual work
sub browse(){
my($statement_URL,$hdr)=@_;
my($content_type,$method);

## MIME type for the HTML form
$content_type=&quot;application/x-$method=&quot;POST&quot;; ## The method of submitting the FORM (<FORM method=post....>)

## Create the HTTP header that we want to sent to the web server where the form is located
my($headers)= new HTTP::Headers
'Content-Type' => $content_type,
'MIME-Version' => '1.0',
'Date' => HTTP::Date::time2str(time),
'Accept' => 'text/html';

my($ua)= new LWP::UserAgent;
## Mimick the Netscape browser ver 4.7
$ua->agent(&quot;Mozilla/4.7 [en] (WinNT; U)&quot;); # Define env variable - HTTP_USER_AGENT

## Create a new URL obejct
my($url)= new URI::URL($statement_URL);
## Send the HTTP request to the web server
my($request)= new HTTP::Request($method, $url, $headers,$hdr);

## gather the response from the web server. (results of the HTML form submission)
my($response)= $ua->request($request);

my($reply);

if ($response->is_success){ ## All is ok
$reply=$response->content; ## send the HTML output
}else{
$reply=$response->error_as_HTML(); ## send the error
}
return $reply;
}
[/script]

Even though this script is heavily commented , I got lost in the early part.

Please teach me,

Thanks!
 
The script is basically pretending to be a web browser. It sends a HTTP request to the German Amazon site, with the contents of a prepopulated form and prints the resulting page.

This is a very basic idea, and one that with the addition of some more recent upgrades to the LWP:: suite of modules on CPAN, can written alot simpler.

If you're interested in this sort of thing, I whole heartedly recommend Sean Burke's book &quot;Perl & LWP&quot; [], which explains some neat little tricks to process forms and do screen scraping, so that the above could be rewritten remove the HTML and print plain text. Barbie
Leader of Birmingham Perl Mongers
 
Hi,
I bought the book, but I still do't find the answers for my questions .... I am having problems trying to log in on to webpages. I ahave made this script but it doesn't work. ><




#!/usr/bin/perl -w

use strict;
use LWP::UserAgent;

my $browser = LWP::UserAgent->new;
my $response =$browser->post(
' [
'username' => 'username',
'password' => 'password'
],
);
die &quot;error&quot; unless $response->is_success;

open OUT, &quot;>results.txt&quot;;
print OUT $response;
close OUT;

exit;

This is my problem. The results that the problem printed is:

HTTP::Response=HASH(0x1b88dd0) Instead of a real webpage...

How can i fix it?
 
Code:
print OUT $response;
#should be
print OUT $response->content;

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
Thanks!!
I have a few more questions. Supposing that the form input names are all correct, and also my username and password work; will this script work? Am i doing this right? Also, does the &quot;print OUT $response->content;&quot; mean that it is printing the &quot;response&quot; page (the page shown after my script tries to log in)????

Thanks again
 
Yes, if the information is correct, it should work (possibly barring redirects). If it is incorrect, you'll get whatever error page supplied by the website you were trying to log into, but it will still get a page. is_success will return true, because it successfully obtained a page. It doesn't know what was on the page, all it knows is it got one.

And if you're curious, $response in this case is the object containing the response. The data it caught ends up accessable by the ->content method, just a part of the response object. You have to ask the specific object to give you the content. Since perl objects are hash references, that's why you got the HASH(0x1b88dd0) as the value. That's what the response object looks like to perl, internally.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
Unfortunely, my program doesn't log in to the page =(
$response->content just give me the orginal page, as if i didn't log in at all. It didn't give me a password error, or even a log in success page. What should I do??

thanks!
 
I'm not sure, it works fine using a similar example on my end. A few things I can think of:

Check and make sure that the form element names really are 'username' and 'password'.

Does the script support POST requests instead of GET? Maybe you have to have them all on in the URL like a GET.

It may not like the mix of GET and POST. Have you tried changing the request like this:
[tt]
my $response =$browser->post(
    '[ignore][/ignore][red]?id=2[/red]',
    [
        'username' => 'username',
        'password' => 'password'
    ],
);

my $response =$browser->post(
    '[ignore][/ignore]',
    [
[red]'id' => 2,[/red]
        'username' => 'username',
        'password' => 'password'
    ],
);
[/tt]
Also, be sure you have the trailing quote ' on the url, it was missing from your post.

This is just what came to mind. Hope it helps.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
Hi,

I editted my new script, but it returns to an empty page :(

#!/usr/bin/perl -w

use strict;
use LWP::UserAgent;

my $browser = LWP::UserAgent->new;
my $response =$browser->post(
' [
'id' => 2,
'login' => 'example@hotmail.com',
'passwd' => 'password'
],
);




open OUT, &quot;>results.htm&quot;;
print OUT $response->content;
close OUT;

exit;

May I see your example?
 
It wouldn't help, it's exactly as yours is, just with different information.

From further research, two problems exist. First, the major one, the action of the login form on that page is not the same as the url that created the login page itself, it's something totally different:
Code:
[URL unfurl="true"]https://login.passport.com/ppsecure/post.srf?lc=1033&id=10&tw=20&cbid=10&da=passport.com[/URL]
[code]
Second, if you look at that action, it's an https, a secure site. I had to point perl to a number of SSL .dll's for it to load it correctly.

Third, when I had an unsuccessful login, it send me a short redirect page. To be useful to you, you'd have to parse out the url it is redirecting to. The meta-tag looks something like this:
[code]
<META HTTP-EQUIV=&quot;REFRESH&quot; CONTENT=&quot;0; URL=http://login.passport.net/...blah...&quot;>
So this is my current test script:
Code:
use strict;
use warnings;
use LWP::UserAgent;

my $browser = LWP::UserAgent->new;
my $response =$browser->post(
	'[URL unfurl="true"]https://login.passport.com/ppsecure/post.srf?lc=1033&id=10&tw=20&cbid=10&da=passport.com',[/URL]
	[ login => 'asdf@hotmail.com', passwd => 'asdf' ]
);

die &quot;error&quot; unless $response->is_success;

print $response->content;
It's at least a few more ideas to think about. Wandering through that login page's source reminded me why I hate generated code of any kind. Over seventeen thousand characters for a three input form. Faster computers and higher bandwidth has made the world fat and sloppy.

----------------------------------------------------------------------------------
...but I'm just a C man trying to see the light
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top