Duplicate Records

Bonediggler1 · Mar 22, 2010

Hi

I am trying to isolate and then delete duplicate records from a large table. In this case every field in a record has to be the same as another record for it to be a duplicate.

I need to check whether the following logic is correct. It seems to be, but given the large number of records I am dealing with it is hard to validate...

Basically to identify duplicates (triplicates etc) I use the ROW_NUMBER() function and PARTITION BY every field in the table to generate a new ROWNUM field in a new query.

Using the generated ROWNUM field I can then delete every record with a value more than one.

Does this make sense?

Thank you!!

markros · Mar 22, 2010

That's one of the standard techniques.

Another one

Code:

;with cteDups as (select min(ID), Field1, Field2, Field3 
from myTable group by Field1, Field2, Field3 having count(*) > 1)

delete T from myTable T inner join cteDups C on
T.Field1 = C.Field1 and T.Field2 = C.Field2 and T.Field3 = C.Field3 where T.ID > C.ID --(assuming sequential int IDs)

PluralSight Learning Library

Bonediggler1 · Mar 22, 2010

Ok thank you.

I thought my idea seemed fairly obvious, but as I could not find any literature on the web to support it I started having doubts. Also nothing in the standard MS documentation of the ROW_NUMBER() function seems to mention using more than one column in the PARTITION BY clause.

markros · Mar 22, 2010

You can use as many columns in the partition clause as you need.

BTW, I was lucky my post did get through as tek-tips seemed to had difficulties this morning.

PluralSight Learning Library

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Duplicate Records

Bonediggler1

Technical User

markros

Programmer

Bonediggler1

Technical User

markros

Programmer

Similar threads

Part and Inventory Search

Sponsor