Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Compare two files and remove duplicates 1

Status
Not open for further replies.

mrr

Technical User
May 3, 2001
67
US
In the following example I want to compare the two files and throw out any duplicate lines and also lines
that are within 10 from fields 3 and 4.
file 1 would be as follows:

1111 1000 -10 100
1111 1001 1 999
2222 1 1000 2000
2222 2 50 100

file 2 would be as follows:

1111 1000 0 100
1111 1001 1 1000
2222 1 1000 2000
2222 2 0 1000 *** This is the only line that would remain after compare and delete.

Thanks for any assistance.
 
After-thought on the above example.
File 1 may have more lines that are not in file 2
and also, I would like the script to only print out lines
from file 2 that have not been deleted or matched within the range specified comparing fields 3,4.
Thanks
 
Are the files sorted on fields 1 and 2 as in your example data? CaKiwi
 
Yes, I use the sort -u command to remove any duplicate lines
from each file.
Thanks
 
Try this
Code:
{
 a = $0
 a1 = $1
 a2 = $2
 a3 = $3
 a4 = $4
 if ((getline < &quot;file1&quot;) <= 0) exit
 while (a1 < $1 && a2 < $2)
 {
   if ((getline < &quot;file1&quot;) <= 0) exit
 }
 d3 = a3-$3
 d4 = a4-$4
 if (d3<0) d3 = -d3
 if (d4<0) d4 = -d4
 if (d3 >= 10 && d4 >= 10) print a
}
Run be entering

awk -f this-file file2

Hope this helps.
CaKiwi
 
CaKiwi,
I've tried to run this script and get no output from the
above 2 files. This does compare 2 different files, does it not?

I'm also running nawk, if that matters.

Thanks, for the help.
 
You need to change file1 in the script to the name of your first file.
CaKiwi
 
Hi mrr,

For you, what is the result of the compare of this two files :

File 1:

1100 110 11 1
1111 1000 -10 100
1111 1001 1 999
2222 1 1000 2000
2222 2 50 100

File 2:

1000 100 10 1
1100 50 11 1
1111 1000 0 100
1111 1001 1 1000
2222 1 1000 2000
2222 1 2000 3000
2222 2 0 1000
3333 3 3 3
Jean Pierre.
 
Hello Aigles,
Thanks for responding.

I would expect to see only the first and last records
printed from file 1, since records 2-4 would either be duplicates in file 2 or within
the tolerance of 10 on fields 3 & 4.
Thanks again.
 
I assumed that file 1 would always have more lines than file 2 and that every line in file 2 would have a line in file 1 with the same first and second fields. If file 2 can have lines with no equivalent in file 1 and vice versa then it is a more difficult problem. CaKiwi
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top