Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

fastest way to get multiple files via http 1

Status
Not open for further replies.

martyncito

Programmer
Jan 14, 2008
3
GB
I am looking to tune a perl script I have written to get images via http and save the file locally. The script makes an HTTP::Request to urls and writes the files one by one - what would be the best way to speed up the process? I guess it would be more efficient to use threads but I am new to perl and not sure which library might be good to use - Async, HTTP:Async, LWP::parallel...

any tips would be gratefully appreciated.

Many thanks in advance
 
I've always used Parallel::ForkManager. Easy to use, easy to learn, has never failed me :)

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
I wasn't aware of LWP::parallel. That sounds ideally suited to the job and would probably require the least coding yourself.
 
You haven't told us the context in which you are mirroring these images. However, I would warn you that in most cases you actually want to slow down the speed at which you are getting from a website, not speed it up. You don't want to spam a website with 50 get requests a second. This will likely get your IP range banned.

- Miller
 
Travs69,

I have looked at the ForkManager and it looks like this might do the trick. Basically a list of image urls are given to us as a text file and we are required to download these regularly (thousands). Unfortunately they don't tell us which have changed so we have to do a "full refresh" each time. We estimate this process could take hours, hence we would like to run multiple threads concurrently to download them in the quickest time possible. In essence, the process is quite simple eg:

for each file in (filelist) loop
if no available processes then
wait until next available process/thread
else
http get file in next thread
end loop
exit program when all process threads finished

We want the wait time to be kept to the minimum - ie if 5 threads were all executed and number 3 became available before number 1, we want to make use of number 3 immediately.

Any further help/code examples would be greatly appreciated.

Thanks
 
Forkmanager does all of that for you. You should write your code so it gets one file at a time.
like
for $image (@file) {
#LWP Get image;
}

then after it is working add in ForkManager

use Parallel::ForkManager;
$MAX_PROCESSES = 5;
$pm = new Parallel::ForkManager($MAX_PROCESSES);
for $image (@file) {
$pm->start and next;
#LWP get image;
$pm->finish;
}
$pm->wait_all_children;

The wait for all children makes the parent program wait till all of the children are finished to continue.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
travs69,

you were right about it being easy to learn and easy to use! I had a go at implementing the solution yesterday then logged on here today to find the code I wrote was practically identical to the example you just gave!

It doesn't make too much difference at the moment, but i'm hoping that under load testing i might see more of an improvement & also i'm going to try on our QA environment where the servers have multiple CPUs which may make a difference.

Thanks for your help!
 
I would suggest only cranking up the MAX_PROCESSES only as far as it makes a difference. In other words it is good to experiment to see if adding more makes a huge difference or not. If there is a ton of local processing adding more processes might just slow it down. So I usually slowly increase MAX_PROCESSES to get a good balance of script run time vs how hard the local machine has to work.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[noevil]
Travis - Those who say it cannot be done are usually interrupted by someone else doing it; Give the wrong symptoms, get the wrong solutions;
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top