Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Delete Duplicates save One! 6

Status
Not open for further replies.

TheAceMan1

Programmer
Sep 23, 2003
11,174
US
Howdy Everyone! . . . . .

Was asked by a Private PreSchool organization to bring a DB up to date. The DB is flatline (MicrosoftWorks) and [blue]has a great many duplicates[/blue] as far as table normalization is concerned.

What I need is to be able to do is [purple]delete all duplicates save the one of each group![/purple], before I can transfer data to new tables. Data entry personel will correct all records once this is done.

I've added and extra field to specific tables for flaging duplicates, and constructed Delete Queries (individual per table) with subquery to set the flg for duplicates. This works fine, and if I execute the queries, I can delete all duplicates.

[blue]How do I leave the one of each group?[/blue]

Calvin.gif
See Ya! . . . . . .
 
The only way I have been able to accomplish this is via code. The code itself is quite simplistic. Using ye venerable find duplicates wizzard (at least as a starting point) create the recordset of the duplicates. Given this as the record source rstSrc), use a loop to move through this as the source. In a module, declare the dups as rstSrc and the target (with the dups) as rstDest. Loop through rstSrc. for each rec in rstSrc, do FindFirst for the rec in rstDest. Then, while nomatch is false, do findnext and delete. On nomatch being true, start over with the next src rec.




MichaelRed


 
TheAceMan1 said:
has a great many duplicates as far as table normalization is concerned.
Here I am little confused..
Otherwise you can have(You might be knowing it) a combinedPK in a new table except the autonumber field to the combinedPK and append the records to the new table.
Or I am not understanding the question...Sorry

Zameer Abdulla
Jack of Visual Basic Programming, Master in Dining & Sleeping
Visit Me
 
ZmrAbdulla . . . . .

Microsoft Works is a flatline DB. Just one huge table (practically a spreadsheet). I have this table imported into access.

Now, unique name record count is 12K (48K with duplicates), so anything manual is no go . . .

In any case found solution here by [blue]PHV[/blue] & [blue]grande[/blue]: thread701-1007255

Calvin.gif
See Ya! . . . . . .
 
I feel mine also a similar solution. The new table will have all the fields(except the auto number) combined into a PK and paste append the records from the old table to new. It won't allow to paste duplicate records.

In fact I don't know anything about "MS Works"

Zameer Abdulla
Jack of Visual Basic Programming, Master in Dining & Sleeping
Visit Me
 
ZmrAbdulla . . . . .

Ihave tried both and get the same results. Great!

But I'd still like to do this with a single query. Current testing with nested subqueries . . . . almost there . . .



Calvin.gif
See Ya! . . . . . .
 
To All . . . . .

It was nice being back in SQL school. It took some time but I got it. Interested in your view on the following:

The following SQL [blue]deletes duplicates[/blue], [purple]leaving one record out of each group of duplicates[/purple], in a single table.

Requirements: an [blue]ID field[/blue] and/or [blue]field containing the duplicates[/blue].

To use the SQL, copy & paste into WordPad the [blue]packed view[/blue]. Then replace the following (selecting all for each):
[ol][li][blue]YourTableName[/blue][/li]
[li][green]YourIDFieldName[/green][/li]
[li][purple]YourDuplicateFieldName[/purple][/li][/ol]
When replacements are done you have two choices . . .
[ol][li]Make a query as follows:[/li]
[ol a][li] Open a new query. Do not select any tables.[/li]
[li]Select SQL View and paste.[/li][/ol]
[li]Ue the packed SQL where you like![/li][/ol]

[tt][purple]********************
* Operational View *
********************[/purple][/tt]
DELETE [blue]YourTableName[/blue].[green]YourIDFieldName[/green], [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
FROM [blue]YourTableName[/blue]

WHERE ((([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) Not In

(SELECT Max([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) AS [green]YourIDFieldName[/green]
FROM [blue]YourTableName[/blue]
GROUP BY [blue]YourTableName[/blue].YourDuplicateFieldName
HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1))))

AND (([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]) In

(SELECT [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
FROM [blue]YourTableName[/blue]
GROUP BY [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))));

[tt][Purple]***************
* Packed View *
***************[/purple][/tt]
DELETE [blue]YourTableName/blue.[green]YourIDFieldName[/green], [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple] FROM [blue]YourTableName[/blue] WHERE ((([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) Not In (SELECT Max([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) AS [green]YourIDFieldName[/green] FROM [blue]YourTableName[/blue] GROUP BY [blue]YourTableName[/blue].YourDuplicateFieldName HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))) AND (([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]) In (SELECT [blue]YourTableName/blue].[purple]YourDuplicateFieldName[/purple] FROM [blue]YourTableName[/blue] GROUP BY [blue]YourTableName/blue].[purple]YourDuplicateFieldName[/purple] HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))));


Calvin.gif
See Ya! . . . . . .
 
To All . . . . .

It was nice being back in SQL school. It took some time but I got it. Interested in your view on the following:

The following SQL [blue]deletes duplicates[/blue], [purple]leaving one record out of each group of duplicates[/purple], in a single table.

Requirements: an [blue]ID field[/blue] and/or [blue]field containing the duplicates[/blue].

To use the SQL, copy & paste into WordPad the [blue]packed view[/blue]. Then replace the following (selecting all for each):
[ol][li][blue]YourTableName[/blue][/li]
[li][green]YourIDFieldName[/green][/li]
[li][purple]YourDuplicateFieldName[/purple][/li][/ol]
When replacements are done you have two choices . . .
[ol][li]Make a query as follows:[/li]
[ol a][li] Open a new query. Do not select any tables.[/li]
[li]Select SQL View and paste.[/li][/ol]
[li]Ue the packed SQL where you like![/li][/ol]

[tt][purple]********************
* Operational View *
********************[/purple][/tt]
DELETE [blue]YourTableName[/blue].[green]YourIDFieldName[/green], [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
FROM [blue]YourTableName[/blue]

WHERE ((([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) Not In

(SELECT Max([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) AS [green]YourIDFieldName[/green]
FROM [blue]YourTableName[/blue]
GROUP BY [blue]YourTableName[/blue].YourDuplicateFieldName
HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1))))

AND (([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]) In

(SELECT [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
FROM [blue]YourTableName[/blue]
GROUP BY [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))));

[tt][Purple]***************
* Packed View *
***************[/purple][/tt]
DELETE [blue]YourTableName/blue.[green]YourIDFieldName[/green], [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple] FROM [blue]YourTableName[/blue] WHERE ((([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) Not In (SELECT Max([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) AS [green]YourIDFieldName[/green] FROM [blue]YourTableName[/blue] GROUP BY [blue]YourTableName[/blue].YourDuplicateFieldName HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))) AND (([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]) In (SELECT [blue]YourTableName/blue].[purple]YourDuplicateFieldName[/purple] FROM [blue]YourTableName[/blue] GROUP BY [blue]YourTableName/blue].[purple]YourDuplicateFieldName[/purple] HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))));
Thats it! . . . . Your Thoughts! . . .

Calvin.gif
See Ya! . . . . . .
 

AceMan's SQL statement not only deserves a star, but it should also shine in its intended beauty:

Code:
DELETE [blue][blue]YourTableName[/blue][/blue].[green][green]YourIDFieldName[/green][/green], [blue]YourTableName[/blue].[purple][purple]YourDuplicateFieldName[/purple][/purple] FROM [blue]YourTableName[/blue] WHERE ((([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) Not In (SELECT Max([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) AS [green]YourIDFieldName[/green] FROM [blue]YourTableName[/blue] GROUP BY [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple] HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))) AND (([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]) In (SELECT [blue][blue]YourTableName[/blue][/blue].[purple]YourDuplicateFieldName[/purple] FROM [blue]YourTableName[/blue] GROUP BY [blue][blue]YourTableName[/blue][/blue].[purple]YourDuplicateFieldName[/purple] HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))));

Note: Execution on large tables may take some time. My test table had about 19,000 records.

Thanks, AceMan.


TomCologne
 
How are ya TomCologne . . . . .

Thanks for the [blue]cleanup & pinky.[/blue]

I drudgingly completed this about 3:30am after over 2hours of testing when it hit me what was needed. I think I [blue]passed out[/blue] after I hit submit the second time! [sleeping]

Decided to make it an FAQ (faq701-5721). [purple]Not approved yet.[/purple] Believe others can make use of it as an exhaustive internet search produced poor results.

[blue]Glad to be of service.[/blue]

[purple]You take care! . . . . . Ya Hear![/purple]

Calvin.gif
See Ya! . . . . . .
 
Just a though, doesn't the following suffice ?
DELETE FROM YourTableName
WHERE YourIDFieldName Not In (SELECT Max(YourIDFieldName) FROM YourTableName GROUP BY YourDuplicateFieldName);




Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Haaaaaay PHV . . . . .

My God! . . . . all the time I spent in this thread, I was looking for you! (hoping you would pick it up). I have a great tendency to complicate SQL and just knew there was a simpler answer!

What am I to do with you! . . . . [blue]you dog you![/blue] ;-)

Calvin.gif
See Ya! . . . . . .
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top