I have a script which performs some complex data manipulation on multiple files. I am looking to decrease the processing time, but have not been very successful.
THE SCRIPT
-Read 6 files directly into hashes. These files are about 14MB in size.
-Read the main file line by line using "while". This file is about 1.25GB.
-Write the output file line by line, creating a final output file that is 1.4GB in size.
-Inside the script I build several summary hashes & arrays for each book of business...the largest book of business would build hashes & arrays holding roughly 700K records, each record about 160 bytes in size. These summaries are written to the output file and cleared after each book of business (there are 12K of these).
THE COMPUTER
-I originally ran this on a dual Xeon 450, 1GB of RAM, Win2K Server (OS is not option, so please save the flames). It took 3 hours and 17 minutes to complete. Of course, the script only uses one of the 2 processors - maxed out the one it was using.
-I moved this to a new box. This one has a 3GHz P4, 2G RAM, Win2K Server...and FSB is 533MHz. The script ran in 2 hours and 13 minutes. Better.
I think I can get much better on the new box. The CPU is maxed, but the script only uses 300MB of RAM. So I decided to try and use more RAM and read the entire input file (the 1.25GB file) into memory using "foreach $line (<BIGFILE>)". This time the script used 1.4GB of RAM and the CPU was maxed. It improved the processing time by a whopping 4 minutes.
Any suggestions, tips?
Sorry for such a long post...if you have made it here, thanks just for staying with me.
Scott
THE SCRIPT
-Read 6 files directly into hashes. These files are about 14MB in size.
-Read the main file line by line using "while". This file is about 1.25GB.
-Write the output file line by line, creating a final output file that is 1.4GB in size.
-Inside the script I build several summary hashes & arrays for each book of business...the largest book of business would build hashes & arrays holding roughly 700K records, each record about 160 bytes in size. These summaries are written to the output file and cleared after each book of business (there are 12K of these).
THE COMPUTER
-I originally ran this on a dual Xeon 450, 1GB of RAM, Win2K Server (OS is not option, so please save the flames). It took 3 hours and 17 minutes to complete. Of course, the script only uses one of the 2 processors - maxed out the one it was using.
-I moved this to a new box. This one has a 3GHz P4, 2G RAM, Win2K Server...and FSB is 533MHz. The script ran in 2 hours and 13 minutes. Better.
I think I can get much better on the new box. The CPU is maxed, but the script only uses 300MB of RAM. So I decided to try and use more RAM and read the entire input file (the 1.25GB file) into memory using "foreach $line (<BIGFILE>)". This time the script used 1.4GB of RAM and the CPU was maxed. It improved the processing time by a whopping 4 minutes.
Any suggestions, tips?
Sorry for such a long post...if you have made it here, thanks just for staying with me.
Scott