Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations wOOdy-Soft on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Finding duplicate values

Status
Not open for further replies.

mahesh736

Programmer
Jul 20, 2001
9
US
I am trying to figure out a way of finding out if any duplicates exist in the data values of a particular column.
In short, we should be able to define an Unique key constraint on that column.
eg. Table A has colA and colB, then I want to find if duplicate values exist in colB.

Looping thru' all the records is a way, but I want to avoid it and achieve the results using SELECT query.

Thanks
 
One possibility is

select colB, count(*) from TableA
group by colB
having count(*) > 1;

You will only get rows returned from this query if there are duplicates in colB.
 
Thank You. It works fine.
But does the 'HAVING' clause use indexes. If not, I may run into performance issues as I might be processing about million records.

Thanks..
 
I think that there is no way to use indexes for your problem.
A full table scan is necessary : Oracle has to search the whole table to see if there are duplicates.
If there is no "where" clause, no index will be used.

Regards
 
If you have a non-unique index defined on colB, I think my query would probably use it. However, there are other possible approaches that you might want to consider. One way is to simply try to create the unique constraint using an "exceptions into" clause. That will either successfully create your constraint, or put the duplicate rows into an exceptions table. Please see thread186-62406 for details.
 

I don't think any index can be used with Karluk query, except if the table is a IOT.

Anyway, acces via index takes longer than full table scan when all rows have to be read.

As Karluk said, "exceptions into" clause is a very smart way to delete duplicate rows, but it won't take fewer time than your query.

Regards
 
I think that utilizing index will help in this situation. There's some overhead when using index to SELECT values, if you need most part of the table, but in this case you need to GROUP them, so you need sorting anyway.
 
I don't understand how and why it would help to use and index when you have to check all rows of a table ?
 
It eliminates the need in SORTING. It may also provide FFI access.
 
Do you think the foll. query will be more efficient:

select a.colB, count(*) from TableA a
where exists
(select b.colB from TableA b where a.colB = b.ColB
and a.rowid <> b.rowid )
group by a.colB;


TableA is Indexed on colB.. Will the above query use this index?

Thanks

 
This query WIIL use the index, but it's UNLIKELY to be more efficient.
 
I really think you need to do some research on your end. Specifically, do an &quot;explain plan&quot; on whatever query you want to try and see what access path Oracle intends to use.

As I said before, I think the query I posted earlier will use an index on colB, if it is available. One caution, however. If colB allows nulls, you should exclude them from the query. Oracle doesn't include null values in indexes so, if it has to check for nulls, it will have to scan through the entire table. That would make the query

select colB, count(*) from TableA
where colB is not null
group by colB
having count(*) > 1;
 
You may also tune the database parameters for memory and sort area.
It really makes a difference when sorts can be performed in mempry instead of on disks.
 
It's true in general that adjusting memory can be a useful tuning tool. However, it's extremely likely to be productive here. You would have to allocate enough memory to do a sort on a million row table. It's not likely your server has that large an amount of spare memory available.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top