Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations MikeeOK on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Duplicate records

Status
Not open for further replies.

Recht

MIS
Sep 30, 2004
20
US
My brain is mush today, looking for some help deleting duplicate records from a SQL 2000 database.

Basically the database name is ODR

The table name is pro123

The column name that should contain the unique data is idnumber.

I had to merge some data into this database from a recovery, so I know there are duplicate rows now. I just need to keep one copy of the row.

Select IDNUMBER
from [odr]..[pro123]
Group by IDNUMBER
having count(IDNUMBER) > 1)

I.E. I have 4 copies of the row data with idnumber 10, I just need to remove 3 of those rows.

Thanks in advance,

Recht
 
Maybe I don't understand your issue well, but would the DISTINCT clause remove the duplicate records?

SELECT DISTINCT idnumber FROM [ODR]...

dz
dzaccess@yahoo.com
 
That would give me a list of all of the distinct rows in the database, but I would still need to get rid of the duplicates...
 
Hi Recht,

I guess I don't understand then. How could you have duplicates if all the rows are distinct? Do you want to delete all records where a specific field contains duplicates even if the record isn't duplicated (i.e. other fields in the record are different)? If so, how would you decide which record to delete?

dz
dzaccess@yahoo.com
 
In this case the duplicates contain all of the same data, so it doesnt matter which rows I delete as long as there is only one row for each idnumber.

A little more background might help.

A user ran a delete against the database.

Over 700,000 rows were deleted and the database became corrupted.

Extracted out the all of the rows I could find from the corrupt database into a new database (ODR). In the original database the idnumber filed was unique, but when I recovered the rows, I had to shut off the identity and primary keys for the tables. So unfortunately I got duplicate rows where originally there were none.

Then I went through the transaction log and recovered all of the transactions I could find there into the same database I recovered the rows into from the corrupt database by converting all of the deleted transactions into inserts.

So I am left with a new database, that has rows from a corrupt database as well as rows recovered from the tranaction log. Some of those rows have the same values for the IDNUMBER.

Recht



 
Sorry for the poor typing,

"In the original database the idnumber filed was unique"

should read "In the original database the idnumber field was unique
 
Recht,

You said "the duplicates contain all of the same data"

If this is true, the SELECT clause will remove the duplicates. You can run a Make Table Query. Once you are confident that the data is complete, delete the source table and use the newly created table. Or save the source file in case you ever need it again.

If I am still misunderstanding you, please post two or three records of your data (all fields aren't necessary) so I can try to follow along better.

Thanks,

dz
dzaccess@yahoo.com
 
IDNUMBER proctime imp costlab pricelab
----------- ------------ -------- -------- ------------
262801 .000000 .0 6.00 6.00
262801 .000000 .0 6.00 6.00

This is a sample of one of the cases where the data is exactly the same.

I just want to delete one of these rows, so the other remains.

There are like 400,000 cases where this is the case, sometimes there are 3 rows with the same IDNUMBER and exactly the same data.

IDNUMBER proctime imp costlab pricelab
----------- ------------ --------- ------------ --------
1182126 1.000000 .0 25.50 25.50
1182126 1.000000 .0 25.50 25.50
1182126 1.000000 .0 25.50 25.50
 
Assuming those are the only five fields, the following SQL will create a new table that contains only one record (one copy of the duplicates in the source)

Code:
SELECT DISTINCT pro123.idNumber, pro123.proctime, pro123.imp, pro123.costlab, pro123.pricelab INTO newtable FROM pro123

OR

SELECT DISTINCT pro123.* INTO tblNew
FROM pro123;

Either should work. The first one specifies the fields individually. The second one gets all the fields. This will work as long as the data in all fields are the same. If your source table has the following, for example:

IDNUMBER proctime imp costlab pricelab
----------- ------------ --------- ------------ --------
1182126 1.000000 .0 25.50 25.50
1182126 2.000000 .0 25.50 25.50
1182126 1.000000 .0 25.50 25.50


the result after running either query above would be:

IDNUMBER proctime imp costlab pricelab
----------- ------------ --------- ------------ --------
1182126 1.000000 .0 25.50 25.50
1182126 2.000000 .0 25.50 25.50

dz
dzaccess@yahoo.com
 
Thank you that helps, is there any way to do this without copying the files into a new or temp table?

I was thinking about maybe modifying something like this?

Declare @min int,
@IID int

SET @min = 1

WHILE @min <> 0 BEGIN
Set rowcount 1
SELECT @IID = (Select Top 1 iID from #Test Group by iID having Count(iID) > 1)
Delete #Test where iID = @IID
Set @min = (Select Top 1 Count(iID) from #Test Group by iID having Count(iID) > 1)
-- Select @Min

END

Set rowcount 0
 
Why don't you want to create a new table? It is the easiest way to do what you want and you don't even have to write any code. You can run the query in the Query Designer; then open the new table and see if it has what you want. You could figure out an algorithm to delete the duplicate records with code, but why go to the trouble of debugging it and making sure that you didn't make a mistake when a simple query gets what you want?

Best regards,


dz
dzaccess@yahoo.com
 
Here is what I ended up using with the original database/table names, thanks for all the help. I made all of the T-SQL statements in a red font in the copy I sent out.

If anyone has any suggestions or comments please don't hesitate to make them!

--BACKUP THE DATABASE!!! Do this before you do anything else
--Mount the 1435530-R database
--Go into design view for the table
--Find the column that has the identity set to on (IDNUMBER)
--Set the identity to off
--Remove the Primary Key for the same column
--Then hit save

--You will need to do the following steps one by one.
--Everything that is in RED is a SQL statement that is to be executed.

--You will need to change the database name 1435530ODR2 in all of the following statements
--to the name of your database

--You will also need to change the table name AAPROCQT for each table you want to do this with.

--GET A ROW COUNT FROM THE CURRENT DATABASE/TABLE

SELECT COUNT(*) FROM 1435530ODR2..AAPROCQT

--GET A ROW COUNT FROM THE RECOVERED DATABASE

SELECT COUNT(*) FROM 1435530-R..AAPROCQT

--Copy all of the recovered rows into the existing database/TABLE

INSERT INTO [1435530ODR2]..AAPROCQT
SELECT * FROM [1435530-R]..AAPROCQT

--VERIFY THE RECOVERED ROWS WERE COPIED IN

SELECT COUNT(*) FROM [1435530ODR2]..[AAPROCQT]

--THIS SHOULD EQUAL THE COUNT FROM THE EXISTING DATABASE
--PLUS THE COUNT FROM THE RECOVERED DATABASE

--COPY THE NON DUPLICATE/DISTINCT ROWS FROM THE DATABASE
--TO A TEMP DATABASE

SELECT DISTINCT [1435530ODR2]..[AAPROCQT].*
INTO [ODRTEMP]..[AAPROCQT]
FROM [1435530ODR2]..[AAPROCQT]

--GET A COUNT OF THE NUMBER OF DISTINCT ROWS

SELECT COUNT(*) FROM [ODRTEMP]..[AAPROCQT]

--CHECK TO SEE IF THERE ARE ANY ROWS THAT HAVE DUPLICATE
--IDNUMBERS BUT DIFFERENT DATA

SELECT IDNUMBER FROM [ODRTEMP]..[AAPROCQT]
GROUP BY IDNUMBER
HAVING COUNT(IDNUMBER) > 1

--LIST ANY ROWS THAT HAVE A DUPLICATE IDNUMBER BUT DIFFERENT DATA

SELECT * FROM [ODRTEMP]..[AAPROCQT] WHERE IDNUMBER IN
(SELECT IDNUMBER FROM [ODRTEMP]..[AAPROCQT]
GROUP BY IDNUMBER
HAVING COUNT(IDNUMBER) > 1)
ORDER BY IDNUMBER

--Need to go through all of the remining duplicates and either change
--the IDNUMBER or delete the ones that shouldnt be there

--DELETE ALL THE ROWS FROM THE EXISTING DATABASE

DELETE FROM [1435530ODR2]..[AAPROCQT]

--COPY THE NON DUPLICATE ROWS BACK TO THE EXISTING DATABASE

INSERT [1435530ODR2]..[AAPROCQT] SELECT * FROM [ODRTEMP]..[AAPROCQT]

--VERIFY THE ROWS WERE COPIED BACK CORRECTLY

SELECT COUNT(*) FROM [1435530ODR2]..[AAPROCQT]

--Turn the IDENTITY back on for the table/column
--Turn on the PRIMARY KEY for the table/column

--Rinse and repeat for each table
 
Stumbled across my old documentation and also found this one:

/*

** This file will delete dupe rows from a table based on a column list. ie., the compound value of the chosen column list is used to determine dupe status and subsequent delete status. We construct some dynamic SQL and use the ROWCOUNT function in determining how many rows to delete.

**

** Revision History:

** -----------------------------------------------------------------------

** Date Name Description

** -----------------------------------------------------------------------

** 07/24/03 DVDS Create

*/


-- declare all variables!

DECLARE @iErrorVar int,

@vchFirstname varchar(30),

@vchLastName varchar(30),

@vchEmailAddress varchar(255),

@iReturnCode int,

@vchAddress1 varchar(100),

@iCount int,

@chCount char(3),

@nvchCommand nvarchar(4000)



-- set initial environment

SET ROWCOUNT 0

SET NOCOUNT ON



-- Build cursor to find duplicated information

DECLARE DelDupe CURSOR FOR

SELECT COUNT(*) AS Amount,

vchFirstName,

vchLastName,

vchAddress1

FROM TestTable

GROUP BY vchFirstName,vchLastName,vchAddress1

HAVING COUNT(*) > 1



OPEN DelDupe

FETCH NEXT FROM DelDupe INTO @iCount,

@vchFirstName,

@vchLastName,

@vchAddress1

WHILE (@@fetch_status = 0)

BEGIN


-- Calculate number of rows to delete for each grouping by subtracting

-- 1 from the total count for a given group.

SELECT @iCount = @iCount - 1

SELECT @chCount = CONVERT(char(3),@iCount)


-- now build the rowcount and delete statements.

SELECT @nvchCommand = N'SET ROWCOUNT ' + @chCount +

'DELETE IndTest ' +

' WHERE vchFirstName = ' + CHAR(34) + @vchFirstName + CHAR(34) +

' AND vchLastName = ' + CHAR (34) + @vchLastName + CHAR(34) +

' AND vchAddress1 = ' + CHAR(34) + @vchAddress1 + CHAR(34)


-- print the statement. For your viewing pleasure only.

PRINT @nvchCommand


-- execute the statement.

EXEC sp_executesql @nvchCommand

SELECT @iErrorVar = @@Error

IF @iErrorVar <> 0

BEGIN

RETURN

END

FETCH NEXT FROM DelDupe INTO @iCount,

@vchFirstName,

@vchLastName,

@vchAddress1

END

CLOSE DelDupe

DEALLOCATE DelDupe

RETURN
 
Hey, Recht:

I'm glad that you got it to work. What a mess, huh?

Have a nice day.



dz
dzaccess@yahoo.com
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top