Finding duplicate values

mahesh736 · Oct 22, 2002

I am trying to figure out a way of finding out if any duplicates exist in the data values of a particular column.
In short, we should be able to define an Unique key constraint on that column.
eg. Table A has colA and colB, then I want to find if duplicate values exist in colB.

Looping thru' all the records is a way, but I want to avoid it and achieve the results using SELECT query.

Thanks

karluk · Oct 22, 2002

One possibility is

select colB, count(*) from TableA
group by colB
having count(*) > 1;

You will only get rows returned from this query if there are duplicates in colB.

mahesh736 · Oct 23, 2002

Thank You. It works fine.
But does the 'HAVING' clause use indexes. If not, I may run into performance issues as I might be processing about million records.

Thanks..

fmorel · Oct 23, 2002

I think that there is no way to use indexes for your problem.
A full table scan is necessary : Oracle has to search the whole table to see if there are duplicates.
If there is no "where" clause, no index will be used.

Regards

karluk · Oct 23, 2002

If you have a non-unique index defined on colB, I think my query would probably use it. However, there are other possible approaches that you might want to consider. One way is to simply try to create the unique constraint using an "exceptions into" clause. That will either successfully create your constraint, or put the duplicate rows into an exceptions table. Please see thread186-62406 for details.

fmorel · Oct 24, 2002

I don't think any index can be used with Karluk query, except if the table is a IOT.

Anyway, acces via index takes longer than full table scan when all rows have to be read.

As Karluk said, "exceptions into" clause is a very smart way to delete duplicate rows, but it won't take fewer time than your query.

Regards

sem · Oct 24, 2002

I think that utilizing index will help in this situation. There's some overhead when using index to SELECT values, if you need most part of the table, but in this case you need to GROUP them, so you need sorting anyway.

fmorel · Oct 24, 2002

I don't understand how and why it would help to use and index when you have to check all rows of a table ?

sem · Oct 24, 2002

It eliminates the need in SORTING. It may also provide FFI access.

mahesh736 · Oct 25, 2002

Do you think the foll. query will be more efficient:

select a.colB, count(*) from TableA a
where exists
(select b.colB from TableA b where a.colB = b.ColB
and a.rowid <> b.rowid )
group by a.colB;

TableA is Indexed on colB.. Will the above query use this index?

Thanks

sem · Oct 25, 2002

This query WIIL use the index, but it's UNLIKELY to be more efficient.

karluk · Oct 25, 2002

I really think you need to do some research on your end. Specifically, do an "explain plan" on whatever query you want to try and see what access path Oracle intends to use.

As I said before, I think the query I posted earlier will use an index on colB, if it is available. One caution, however. If colB allows nulls, you should exclude them from the query. Oracle doesn't include null values in indexes so, if it has to check for nulls, it will have to scan through the entire table. That would make the query

select colB, count(*) from TableA
where colB is not null
group by colB
having count(*) > 1;

fmorel · Oct 25, 2002

You may also tune the database parameters for memory and sort area.
It really makes a difference when sorts can be performed in mempry instead of on disks.

karluk · Oct 25, 2002

It's true in general that adjusting memory can be a useful tuning tool. However, it's extremely likely to be productive here. You would have to allocate enough memory to do a sort on a million row table. It's not likely your server has that large an amount of spare memory available.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Finding duplicate values

mahesh736

Programmer

karluk

MIS

mahesh736

Programmer

fmorel

Programmer

karluk

MIS

fmorel

Programmer

sem

Programmer

fmorel

Programmer

sem

Programmer

mahesh736

Programmer

sem

Programmer

karluk

MIS

fmorel

Programmer

karluk

MIS

Similar threads

Part and Inventory Search

Sponsor