INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Jobs

How to efficiently implement file comparison

How to efficiently implement file comparison

(OP)
Let's say I have a baseline file like this:

CODE

4.1.27-4.1.2-amd64-1cccde6e81b2c42b
cmos2q installed and running
lm-sensors2q installed and running
lspci2q installed and running
mcelog2q installed and running
mpt2q not needed
pmbus2q installed and running
smartmon2q installed and running 

And a newly created file looks like this:

CODE

4.1.27-4.1.2-amd64-b2c2cfc7c5cc6d49
cmos2q installed and running
lm-sensors2q installed and running
lspci2q installed and running
mcelog2q installed and running
pmbus2q installed and running
smartmon2q installed and running
mpt2q not needed 

In theory, these two files are the SAME, because:
1) the first line (in red) can be ignored
2) the blue lines are the same even if at different location.

So, in this case, we cannot simply use File::Compare.

What I did was read both files into hashes and each line is a hash key. The first line is not in the hashes. Then I can compare hash keys through a loop. The implementation is omitted cause it's too simple.

I sense there must be a smarter way to implement this. But I don't know how. So I am here to ask experts for help.

Thanks!

RE: How to efficiently implement file comparison

First thing, you don't need to read both files into hashes, but just one, then you read the second file line by line to check if it is in the hash.
Also you should decide what to do with equal lines: you could have two equal lines in a file and only one in the second. Are these files considered the same or different? With hashes you won't even notice (unless you explicitly check this condition).
If the files are quite similar as in your example, I guess the best procedure would be like this:

1)an array of strings is initially empty
2)first compare the size of both files (after having skipped the first line if relevant) and exit with 0 if they are different
3)read the first line of both files
4)if the array is not empty read the next line from the first file only and go to 8 (exit with 0 if the first file is at eof)
5)exit with 1 if both files are at eof, with 0 if one file only is at eof
6)read the next line from both files
7)if the two lines are the same go to 6
8)check if the first string is in the array (go to 10 if the array is empty)
9)if it is in, discard the string from the array and go to 4
10)push the line of the second file onto the array of strings
11)read the next line from the second file only: exit with 0 if the second file was at eof
12)if the two lines at hand are the same go to 4
13)go to 10

Should work, but I'm not sure, you'll have to check.
Of course if the files are big and the distance of the equal but displaced lines can also be big, this procedure could become slow. With hashes wouldn't necessarily be more efficient though.

http://www.xcalcs.com : Online engineering calculations
http://www.megamag.it : Magnetic brakes for fun rides
http://www.levitans.com : Air bearing pads

RE: How to efficiently implement file comparison

(OP)
Thank you, prex.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members!

Resources

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close