Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Process time so different, why?

Status
Not open for further replies.

mlibeson

Programmer
Mar 6, 2002
311
US
Using a set of ASCII data files, I have written PERL code that process the data to produce output.

The first time I wrote the code, the data was processed in 20 minutes.

Next I modified the same code where the only change was that I stored hashes until I processed all files instead of clearing them out after each data file processed. This extended the time to process the same data to several hours.

Any idea as to why this would happen?
 
It would depend on your hardware, memory, swapfile size, size of the data you're working with, and many other variables

BUT a factor of close to 10, there exists a possiblity of an error in the logic

--Paul


Nancy Griffith - songstress extraordinaire,
and composer of the snipers anthem "From a distance ...
 
I am running under windows 2000

I have a gig of mem and according to the system it is doing a small amount of swapping over the 1 gig. The CPU is constantly at 100%. I have processed similar files with other progams and they do seem to run better.

I am using hashes to keep track of data within the files. I am storing about 90 lists of email addresses with about 10000 email addresses each. I am keeping track of a date for each uniq email address from when it is first sent.

So I am not tracking much.

In the first version of the code, I processed the files the same exact way, except I cleared the hashes I used for tracking.

In the second version, instead of clearing the hashes, I kept processing the files.

In the end, I produced the ouput once all files were processed.

All I did was change when the hashes were cleared and when the ouput was created.
 
Try Tieing the hashes to files

That might speed it up

--Paul

Nancy Griffith - songstress extraordinaire,
and composer of the snipers anthem "From a distance ...
 
Thank you Paul. I will give that a try.

I did try running the code with some intense IO turned off and it did speed up quite a bit. So I think there may be an IO issue as well. The code ouputs several files and a summary. I just reported the summary which sped things up again.
 
I had a similar problem except with hashes, although in reverse...

I started by opening each file in turn reading the contents into an array (before I discovered hashes) operating on the array, outputting, then closing the file...this took several hours.

Then I discovered hashes, opened each file copied the entire data file into an hash, closed the file, operated on the hash and outputted as I went along. This took about 20 seconds...

My pre-supposition would point to your IO and general logic, maybe some code would shed some light on the matter??

Rob Waite
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top