Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

LWP download ---multiple downloads

Status
Not open for further replies.

tmffm

Technical User
Jun 12, 2007
10
I use LWP download to get some files once a week from a site. The problem is that after I download a certain number, anything else after is incomplete. I've infered that this is probably that anti bot thing kicking in ---you know, where they want you to type in a phrase or numbers, to continue. They're obviously trying to prohibit mass, automated downloads.

The interesting thing is that even if I release my IP, I still can't get these files. Any suggestions how I can get around this with LWP?
 
The biggest thing that you can do is implement procedures to respect their bandwidth. There are multiple ways to do this:
[ol]
[li]Save both Last Modified and etag header information so that you do not redownload a file that has not changed.[/li]
[li]Throttle your requests so that you only download a file once a minute or some reasonable period of time.[/li]
[li]Register your bot with the website and let them know what information you are wanting to scrape. Maybe they have an interface to directly access the data without parsing.[/li]
[/ol]

Do these things and you'll notice much better results.

- Miller
 
You can also check your referer and set it to something inside of the site and not something else.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top