URGENT Help on large duplication removal

MikeJones · Mar 20, 2002

Hi,

I have a problem where I need to remove a large amount of duplicate records from a large table.

There are around 300 - 400K duplicate records in the table. The table containts about twice that in terms of total records (800K)

Not all records are duplicated and some are duplicated many times.

In order to spot the differeneces I have to group by most of the columns.

My problems are this...

It will take a long time to run a query with group by to spot the duplicates.
I don't want to run into the snapshot to old problem when I start deleteing rows that my query is trying to hold a consistent read on

This is running on HP-UX 64 bit on 8.1.7

Thanks in advance.

Mike.

Dagon · Mar 20, 2002

delete from dup_tab d1
where rowid <
(select max(rowid) from dup_tab d2
where d1.key1 = d2.key1
and d1.key2 = d2.key2...)

Speed this up by putting an index on (key1,key2...).

Guest_imported · Mar 20, 2002

Hi,
delete from emp a
where rowid not in(select max(rowid) from emp b
where a.empno=b.empno) and
rownum < 2;

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

URGENT Help on large duplication removal

MikeJones

Programmer

Dagon

MIS

Guest_imported

New member

Similar threads

Part and Inventory Search

Sponsor