Compare two files and remove duplicates 1

mrr · Nov 28, 2001

In the following example I want to compare the two files and throw out any duplicate lines and also lines
that are within 10 from fields 3 and 4.
file 1 would be as follows:

1111 1000 -10 100
1111 1001 1 999
2222 1 1000 2000
2222 2 50 100

file 2 would be as follows:

1111 1000 0 100
1111 1001 1 1000
2222 1 1000 2000
2222 2 0 1000 *** This is the only line that would remain after compare and delete.

Thanks for any assistance.

mrr · Nov 28, 2001

After-thought on the above example.
File 1 may have more lines that are not in file 2
and also, I would like the script to only print out lines
from file 2 that have not been deleted or matched within the range specified comparing fields 3,4.
Thanks

CaKiwi · Nov 28, 2001

Are the files sorted on fields 1 and 2 as in your example data? CaKiwi

mrr · Nov 28, 2001

Yes, I use the sort -u command to remove any duplicate lines
from each file.
Thanks

CaKiwi · Nov 28, 2001

Try this

Code:

{
 a = $0
 a1 = $1
 a2 = $2
 a3 = $3
 a4 = $4
 if ((getline < &quot;file1&quot;) <= 0) exit
 while (a1 < $1 && a2 < $2)
 {
   if ((getline < &quot;file1&quot;) <= 0) exit
 }
 d3 = a3-$3
 d4 = a4-$4
 if (d3<0) d3 = -d3
 if (d4<0) d4 = -d4
 if (d3 >= 10 && d4 >= 10) print a
}

Run be entering

awk -f this-file file2

Hope this helps.
CaKiwi

mrr · Nov 28, 2001

CaKiwi,
I've tried to run this script and get no output from the
above 2 files. This does compare 2 different files, does it not?

I'm also running nawk, if that matters.

Thanks, for the help.

CaKiwi · Nov 28, 2001

You need to change file1 in the script to the name of your first file.
CaKiwi

aigles · Nov 28, 2001

Hi mrr,

For you, what is the result of the compare of this two files :

File 1:

1100 110 11 1
1111 1000 -10 100
1111 1001 1 999
2222 1 1000 2000
2222 2 50 100

File 2:

1000 100 10 1
1100 50 11 1
1111 1000 0 100
1111 1001 1 1000
2222 1 1000 2000
2222 1 2000 3000
2222 2 0 1000
3333 3 3 3
Jean Pierre.

mrr · Nov 28, 2001

Hello Aigles,
Thanks for responding.

I would expect to see only the first and last records
printed from file 1, since records 2-4 would either be duplicates in file 2 or within
the tolerance of 10 on fields 3 & 4.
Thanks again.

CaKiwi · Nov 28, 2001

I assumed that file 1 would always have more lines than file 2 and that every line in file 2 would have a line in file 1 with the same first and second fields. If file 2 can have lines with no equivalent in file 1 and vice versa then it is a more difficult problem. CaKiwi

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Compare two files and remove duplicates 1

mrr

Technical User

mrr

Technical User

CaKiwi

Programmer

mrr

Technical User

CaKiwi

Programmer

mrr

Technical User

CaKiwi

Programmer

aigles

Technical User

mrr

Technical User

CaKiwi

Programmer

Similar threads

Part and Inventory Search

Sponsor