×
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Contact US

Log In

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips Forums!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.

Students Click Here

Algorithm to compare binary files and create patch file.

Algorithm to compare binary files and create patch file.

Algorithm to compare binary files and create patch file.

(OP)
I have to write a utility to compare two versions of a binary file (new and old) and then to extract the differences between the two versions of the file. The extracted list will then be used to update the old file and make it the same as the new one (so basically it is a patch utility). I have written it but find that the patch files it creates are larger than those that I can obtain by using diff or some other shareware patch utilities I have tried. The cause seems to be a problem in the routine where I synchronize the two files after a difference is found, but I just cannot get my head around this logic. If anyone has any good algorithms or can help me out with this I really would appreciate it - my deadline is looming nearer!

RE: Algorithm to compare binary files and create patch file.

(OP)
There is Mike. The client is adamant that they do not want to rely on non-proprietary software.

RE: Algorithm to compare binary files and create patch file.

Piece of cake. Read both files into string variables and do a byte-by-byte comparison, building a difference string as you go.
Something like:
FiNum = Freefile
Open "Oldfile.ext" for Binary as #FiNum
Old$ = String$(LOF(FiNum),0)
Get #FiNum, 1, Old$
Close #FiNum
FiNum = Freefile
Open "Newfile.ext" for Binary as #FiNum
New$ = String$(LOF(FiNum),0)
Get #FiNum, 1, New$
Close #FiNum
Diff$ = ""
If Len(New$) = Len(Old$) then 'files are the same size
FoundDiff = False
For B = 1 to Len(New$)
If Mid$(New$,B,1) <> Mid$(Old$,B,1) then
Diff$ = Diff$ + Mid$(New$,B,1)
FoundDiff = True
Else
If FoundDiff then Exit For
End If
Next
End If

The same logic would apply if changes were expected in more than one area of the files. You would simply maintain an array of difference strings. Build an element starting with the first mismatch, stop at the first match, build the next element starting with the next mismatch, and so on.
The logic is slightly trickier when the files are different sizes (indicating insertions or deletions). You need to mark the location of the first difference and then use a separate loop to search for subsequent matches in the new file. If the bytes in oldfile match bytes in newfile later than their position in oldfile then there may have been an insertion. The same logic can find the location of deletions.

The assignment won't seem so complex after you straighten out the spaghetti. Be careful if you decide to use recursion.

RE: Algorithm to compare binary files and create patch file.

Just a thought - the unix diff program. (I know you're using VB; bear with me) There is probably source code out there for a version of diff (the linux distribution would be the first place I would check). Having said that, diff is for text files really.

Cmp - another unix utility - is for binary files and may be if use but exits when it finds the first difference.

Larry Wall's patch program may be interesting and the source may be available. Anecdotally patch is not a good example of readable code...

Mike

Mike Lacey
Mike_Lacey@Cargill.Com
Cargill's Corporate Web Site

RE: Algorithm to compare binary files and create patch file.

(OP)
I don't think so Alt255 - if it were that simple the money would be in the bank.

Many of the files that will be compared are several GB in size - you aren't going to read those in a single leap. So, we read the file in smaller chunks. After we come to the first difference between buffers (strings?) the following scenario exists:
i) the old file has simply been changed - easy, just read ahead until the bytes match again, writing the location and new byte value to the patch file,
ii) the new file has been inserted into - sounds easy, just advance through the new file buffer until the bytes match again or you have come to the end of the buffer in which case read next chunk and repeat,
iii) the new file has been deleted from, same as ii but move forward through the old file buffer,

----- BUT -----

when we come to the first difference we don't know whether it is a change, insert or delete. Should we move ahead through the old buffer or the new or perhaps both. What if the files don't match until the next read of the new file, or the old file, what if they only match up again after 30 reads of the new file. This is the part of the logic that isn't such a "piece of cake".

One other point, when using strings remember that when you compare each element of the string you are actually comparing 2 bytes, not one. This is obviously overcome by using arrays of bytes.

Mike, I had thought about the linux source, but the limitations you mention are the problem. The patch program sounds promising so I will try to track down the code for that.

Thanks guys, please keep the thoughts coming, I need all the help I can get.

RE: Algorithm to compare binary files and create patch file.

(OP)
Geeeeezzzz guys,

I was certain that EdEric or another bright spark would come up with a winner for this one. Still "deperately seeking Susan - sorry solution".

Regards

Brad.

RE: Algorithm to compare binary files and create patch file.

There are lots of ways to compare small files. After all, one can live with a little inefficiency when faced with a two-second comparison. You complicated the issue when you noted the "several GB" size of the files. Had you mentioned that in your original post, I wouldn't have bothered to stick my foot in my mouth.
When you find a fast, efficient solution to your problem, make sure you post it here. We'll nominate you for the Nobel Peace Prize for Computer Sciences.
I don't mean to seem uncaring -- I care! I want to see the answer (probably more than most). But this is the sort of quagmire that loves to suck in unwary programmers, leaving only their nose and eyes exposed above the muck.

Red Flag This Post

Please let us know here why this post is inappropriate. Reasons such as off-topic, duplicates, flames, illegal, vulgar, or students posting their homework.

Red Flag Submitted

Thank you for helping keep Tek-Tips Forums free from inappropriate posts.
The Tek-Tips staff will check this out and take appropriate action.

Reply To This Thread

Posting in the Tek-Tips forums is a member-only feature.

Click Here to join Tek-Tips and talk with other members! Already a Member? Login


Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close