Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Duplicate Records

Status
Not open for further replies.

Bonediggler1

Technical User
Joined
Jul 2, 2008
Messages
156
Location
US
Hi

I am trying to isolate and then delete duplicate records from a large table. In this case every field in a record has to be the same as another record for it to be a duplicate.

I need to check whether the following logic is correct. It seems to be, but given the large number of records I am dealing with it is hard to validate...

Basically to identify duplicates (triplicates etc) I use the ROW_NUMBER() function and PARTITION BY every field in the table to generate a new ROWNUM field in a new query.

Using the generated ROWNUM field I can then delete every record with a value more than one.

Does this make sense?

Thank you!!
 
That's one of the standard techniques.

Another one
Code:
;with cteDups as (select min(ID), Field1, Field2, Field3 
from myTable group by Field1, Field2, Field3 having count(*) > 1)

delete T from myTable T inner join cteDups C on
T.Field1 = C.Field1 and T.Field2 = C.Field2 and T.Field3 = C.Field3 where T.ID > C.ID --(assuming sequential int IDs)

PluralSight Learning Library
 
Ok thank you.

I thought my idea seemed fairly obvious, but as I could not find any literature on the web to support it I started having doubts. Also nothing in the standard MS documentation of the ROW_NUMBER() function seems to mention using more than one column in the PARTITION BY clause.

 
You can use as many columns in the partition clause as you need.

BTW, I was lucky my post did get through as tek-tips seemed to had difficulties this morning.

PluralSight Learning Library
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top