Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Web automation with Perl

Status
Not open for further replies.

RottPaws

Programmer
Mar 1, 2002
478
US
I need to script interaction with a few web sites and I've tried but a couple of the sites have JavaScript on the submit buttons and another is written in Java.

Does anybody know of any module(s) that will help with automating Javascript and/or Java web sites? Or does anybody have a strategy for tricking Mech into working with them?

For one with JavaScript, I need to login, upload a file, and check for confirmation that it was successful. Then after it's processed, I need to login and download a results file.

For the Java one, I need to login, fill in a form and submit it, and check for confirmation.

_________
Rott Paws

...It's not a bug. It's an undocumented feature!!!
 
Try using LWP::UserAgent and using regexps on the page sources to find relevant information and run the requests yourself. Otherwise you'll have a hard time finding a JavaScript-parsing Mechanize-style module.
 
The "OP"?

_________
Rott Paws

...It's not a bug. It's an undocumented feature!!!
 
You could try Samie

No matter what you use, I think it will be difficult at first.


Let us know your results!

X
 
Yeah I'd seen the JavaScript module before, but it doesn't easily relate to this case in particular... that module seems to imply that the Perl script creates all the JS objects first, and then parses it, so to make a browser-savvy JavaScript parser, you'd have to define all the objects (window, document, and not to mention sub-methods, getElementById, getElementByName, getElementByTagName, location, open, etc.)

It would be more effort than it's worth to work that module in such a way as to make it evaluate web-based JavaScripts appropriately. So probably the best way is to evaluate it in regular expressions to find what you're looking for.

For instance, if the page you're working with had this code:
Code:
<script>
function GotoNextPage(name) {
   window.location = "/pub/" + name + ".html";
}
</script>

<input type="button" value="Next" onClick="GotoNextPage('about')">

Code:
# if $src is the page's source
my ($name) = $src =~ /onClick="GotoNextPage\('(.+?)'\)"/i;
my ($path) = $src =~ /window\.location = "(.*?)"/i;

$path =~ s/" \+ name \+ "/$name/g;

You'd have to do things like this on a per-page basis, just examine how the page does things and write regular expressions to compensate for it.

This is actually how I wrote a MySpace module, some pages had JavaScript on them and I used regular expressions to find all the links and everything.
 
Thanks everyone. I'll give it a shot and report back how it goes (or doesn't go) . . .

_________
Rott Paws

...It's not a bug. It's an undocumented feature!!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top