Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations MikeeOK on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

File size comparison

Status
Not open for further replies.

lesliep70

Technical User
Jul 20, 2005
4
US
Hi,
I am new to PERL and need to write a script to compare 2 files of the same name in different directories. If the file sizes are different, I need to output the file name.
Any help would be greatly appreciated.

Les
 
actually since its a comparison there might be no need for a temp variable:

Code:
print "$filename\n" if (-s $first_file != -s $second_file);

-s is the same as the 8th ([9]) element of the stat array



 
This is going to sound geeky but isn't that just a little dangerous?

It would probably be better to do a fingerprint of each file and compare those ...

i.e.

LESLIE
LESL1E


... are the same byte count


Kind Regards
Duncan
 
Note that to use Milleniumlegend's code, you'll have to add `use File::stat' to the top of your script.
 
Duncan

It DOES sound geeky. But that's why were all here, right? See thread219-1090996 about getting MD5 digests of files. It will be a lot slower, of course - may be best to do the byte count thing first, and only compare hashes if they are equal.
 
Leslie

It really is hard to determine what you are trying to achieve - but i think you would agree that assuming a file is the same simply due to a byte count is pretty crazy - unless you can be CERTAIN that this, alone, is sufficient


Kind Regards
Duncan
 
Oh, and i forgot to mention, I would totally agree with the method Steve advised should you choose to fingerprint each file


Kind Regards
Duncan
 
This is going to sound geeky but isn't that just a little dangerous?

Not geeky, sounds like good advice, but not knowing what the OP is trying to do makes it impossible to know if just checking file sizes is sufficient. Just printing a filename out is not going to be "dangerous".
 
Good point Kevin - i didn't really mean dangerous ... i really meant dangerous should you choose you delete one of the files based on the two appearing to be the same - for example


Kind Regards
Duncan
 
Surely, the solution to this problem would depend on why a difference in file size between the two files is important.
Are you checking for some sort of corruption, the latest version or some other parameter?

Keith
 
Guys,
Thank you so much for all the help. It really is awesome. Basically wanted to compare the 2 files and if they are different size, flag them so that we can check for differences. What you've given allows me to do this.
 
The Text::Diff module may be useful in finding the differences between them. It's a cross-platform Perl implementation of some features of the unix `diff' command.
 
Hi Leslie

I don't think i have been clear enough. Checking the file size alone could EASILY give a false-positive ... you might well check the size, they come up the same, and move on assuming they are the same

Code:
THIS LINE HAS THE SAME NUMBER OF CHARACTERS AS THE FOLLOWING LINE
THIS LINE HAS THE SAME NUMBER OF CHARCTERS AS THE LINE ABOVE THIS


Kind Regards
Duncan
 
The files are compiled Progress 4gl files. Even with a similar num of characters, any coding difference will generate a different sized compiled code. It is ''relatively'' safe.
 
Ok, seems you guys were right. I really do need to 'fingerprint' the files. I will probably only do this if the sizes are similar. The question is, how? I've had a look at the MD5 post and it's all gobbledeegook to me.
 
I only enquired about the size as it would be simpler to do a byte by byte compare each time. I must confess I do not know how big a compiled Progress 4gl file is.


Keith
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top