Delete Duplicates save One! 6

TheAceMan1 · Feb 25, 2005

Howdy Everyone! . . . . .

Was asked by a Private PreSchool organization to bring a DB up to date. The DB is flatline (MicrosoftWorks) and [blue]has a great many duplicates[/blue] as far as table normalization is concerned.

What I need is to be able to do is [purple]delete all duplicates save the one of each group![/purple], before I can transfer data to new tables. Data entry personel will correct all records once this is done.

I've added and extra field to specific tables for flaging duplicates, and constructed Delete Queries (individual per table) with subquery to set the flg for duplicates. This works fine, and if I execute the queries, I can delete all duplicates.

[blue]How do I leave the one of each group?[/blue]

See Ya! . . . . . .

MichaelRed · Feb 26, 2005

The only way I have been able to accomplish this is via code. The code itself is quite simplistic. Using ye venerable find duplicates wizzard (at least as a starting point) create the recordset of the duplicates. Given this as the record source rstSrc), use a loop to move through this as the source. In a module, declare the dups as rstSrc and the target (with the dups) as rstDest. Loop through rstSrc. for each rec in rstSrc, do FindFirst for the rec in rstDest. Then, while nomatch is false, do findnext and delete. On nomatch being true, start over with the next src rec.

MichaelRed

ZmrAbdulla · Feb 26, 2005

TheAceMan1 said:
has a great many duplicates as far as table normalization is concerned.

Here I am little confused..
Otherwise you can have(You might be knowing it) a combinedPK in a new table except the autonumber field to the combinedPK and append the records to the new table.
Or I am not understanding the question...Sorry

Zameer Abdulla
_{Jack of Visual Basic Programming, Master in Dining & Sleeping}
Visit Me

TheAceMan1 · Feb 26, 2005

ZmrAbdulla . . . . .

Microsoft Works is a flatline DB. Just one huge table (practically a spreadsheet). I have this table imported into access.

Now, unique name record count is 12K (48K with duplicates), so anything manual is no go . . .

In any case found solution here by [blue]PHV[/blue] & [blue]grande[/blue]: thread701-1007255

See Ya! . . . . . .

ZmrAbdulla · Feb 26, 2005

I feel mine also a similar solution. The new table will have all the fields(except the auto number) combined into a PK and paste append the records from the old table to new. It won't allow to paste duplicate records.

In fact I don't know anything about "MS Works"

Zameer Abdulla
_{Jack of Visual Basic Programming, Master in Dining & Sleeping}
Visit Me

TheAceMan1 · Feb 26, 2005

ZmrAbdulla . . . . .

Ihave tried both and get the same results. Great!

But I'd still like to do this with a single query. Current testing with nested subqueries . . . . almost there . . .

See Ya! . . . . . .

TheAceMan1 · Feb 26, 2005

To All . . . . .

It was nice being back in SQL school. It took some time but I got it. Interested in your view on the following:

The following SQL [blue]deletes duplicates[/blue], [purple]leaving one record out of each group of duplicates[/purple], in a single table.

Requirements: an [blue]ID field[/blue] and/or [blue]field containing the duplicates[/blue].

To use the SQL, copy & paste into WordPad the [blue]packed view[/blue]. Then replace the following (selecting all for each):
[ol][li][blue]YourTableName[/blue][/li]
[li][green]YourIDFieldName[/green][/li]
[li][purple]YourDuplicateFieldName[/purple][/li][/ol]
When replacements are done you have two choices . . .
[ol][li]Make a query as follows:[/li]
[ol a][li] Open a new query. Do not select any tables.[/li]
[li]Select SQL View and paste.[/li][/ol]
[li]Ue the packed SQL where you like![/li][/ol]

[tt][purple]********************
* Operational View *
********************[/purple][/tt]
DELETE [blue]YourTableName[/blue].[green]YourIDFieldName[/green], [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
FROM [blue]YourTableName[/blue]

WHERE ((([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) Not In

(SELECT Max([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) AS [green]YourIDFieldName[/green]
FROM [blue]YourTableName[/blue]
GROUP BY [blue]YourTableName[/blue].YourDuplicateFieldName
HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1))))

AND (([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]) In

(SELECT [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
FROM [blue]YourTableName[/blue]
GROUP BY [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))));

[tt][Purple]***************
* Packed View *
***************[/purple][/tt]
DELETE [blue]YourTableName/blue.[green]YourIDFieldName[/green], [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple] FROM [blue]YourTableName[/blue] WHERE ((([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) Not In (SELECT Max([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) AS [green]YourIDFieldName[/green] FROM [blue]YourTableName[/blue] GROUP BY [blue]YourTableName[/blue].YourDuplicateFieldName HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))) AND (([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]) In (SELECT [blue]YourTableName/blue].[purple]YourDuplicateFieldName[/purple] FROM [blue]YourTableName[/blue] GROUP BY [blue]YourTableName/blue].[purple]YourDuplicateFieldName[/purple] HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))));

See Ya! . . . . . .

TheAceMan1 · Feb 26, 2005

To All . . . . .

It was nice being back in SQL school. It took some time but I got it. Interested in your view on the following:

The following SQL [blue]deletes duplicates[/blue], [purple]leaving one record out of each group of duplicates[/purple], in a single table.

Requirements: an [blue]ID field[/blue] and/or [blue]field containing the duplicates[/blue].

To use the SQL, copy & paste into WordPad the [blue]packed view[/blue]. Then replace the following (selecting all for each):
[ol][li][blue]YourTableName[/blue][/li]
[li][green]YourIDFieldName[/green][/li]
[li][purple]YourDuplicateFieldName[/purple][/li][/ol]
When replacements are done you have two choices . . .
[ol][li]Make a query as follows:[/li]
[ol a][li] Open a new query. Do not select any tables.[/li]
[li]Select SQL View and paste.[/li][/ol]
[li]Ue the packed SQL where you like![/li][/ol]

[tt][purple]********************
* Operational View *
********************[/purple][/tt]
DELETE [blue]YourTableName[/blue].[green]YourIDFieldName[/green], [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
FROM [blue]YourTableName[/blue]

WHERE ((([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) Not In

(SELECT Max([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) AS [green]YourIDFieldName[/green]
FROM [blue]YourTableName[/blue]
GROUP BY [blue]YourTableName[/blue].YourDuplicateFieldName
HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1))))

AND (([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]) In

(SELECT [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
FROM [blue]YourTableName[/blue]
GROUP BY [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]
HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))));

[tt][Purple]***************
* Packed View *
***************[/purple][/tt]
DELETE [blue]YourTableName/blue.[green]YourIDFieldName[/green], [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple] FROM [blue]YourTableName[/blue] WHERE ((([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) Not In (SELECT Max([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) AS [green]YourIDFieldName[/green] FROM [blue]YourTableName[/blue] GROUP BY [blue]YourTableName[/blue].YourDuplicateFieldName HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))) AND (([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]) In (SELECT [blue]YourTableName/blue].[purple]YourDuplicateFieldName[/purple] FROM [blue]YourTableName[/blue] GROUP BY [blue]YourTableName/blue].[purple]YourDuplicateFieldName[/purple] HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))));
Thats it! . . . . Your Thoughts! . . .

See Ya! . . . . . .

TomCologne · Feb 27, 2005

AceMan's SQL statement not only deserves a star, but it should also shine in its intended beauty:

Code:

DELETE [blue][blue]YourTableName[/blue][/blue].[green][green]YourIDFieldName[/green][/green], [blue]YourTableName[/blue].[purple][purple]YourDuplicateFieldName[/purple][/purple] FROM [blue]YourTableName[/blue] WHERE ((([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) Not In (SELECT Max([blue]YourTableName[/blue].[green]YourIDFieldName[/green]) AS [green]YourIDFieldName[/green] FROM [blue]YourTableName[/blue] GROUP BY [blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple] HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))) AND (([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]) In (SELECT [blue][blue]YourTableName[/blue][/blue].[purple]YourDuplicateFieldName[/purple] FROM [blue]YourTableName[/blue] GROUP BY [blue][blue]YourTableName[/blue][/blue].[purple]YourDuplicateFieldName[/purple] HAVING (((Count([blue]YourTableName[/blue].[purple]YourDuplicateFieldName[/purple]))>1)))));

Note: Execution on large tables may take some time. My test table had about 19,000 records.

Thanks, AceMan.

TomCologne

TheAceMan1 · Feb 27, 2005

How are ya TomCologne . . . . .

Thanks for the [blue]cleanup & pinky.[/blue]

I drudgingly completed this about 3:30am after over 2hours of testing when it hit me what was needed. I think I [blue]passed out[/blue] after I hit submit the second time! [sleeping]

Decided to make it an FAQ (faq701-5721). [purple]Not approved yet.[/purple] Believe others can make use of it as an exhaustive internet search produced poor results.

[blue]Glad to be of service.[/blue]

[purple]You take care! . . . . . Ya Hear![/purple]

See Ya! . . . . . .

PHV · Jul 7, 2005

Just a though, doesn't the following suffice ?
DELETE FROM YourTableName
WHERE YourIDFieldName Not In (SELECT Max(YourIDFieldName) FROM YourTableName GROUP BY YourDuplicateFieldName);

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886

TheAceMan1 · Jul 7, 2005

Haaaaaay PHV . . . . .

My God! . . . . all the time I spent in this thread, I was looking for you! (hoping you would pick it up). I have a great tendency to complicate SQL and just knew there was a simpler answer!

What am I to do with you! . . . . [blue]you dog you![/blue] ;-)

See Ya! . . . . . .

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Delete Duplicates save One! 6

TheAceMan1

Programmer

MichaelRed

Programmer

ZmrAbdulla

Technical User

TheAceMan1

Programmer

ZmrAbdulla

Technical User

TheAceMan1

Programmer

TheAceMan1

Programmer

TheAceMan1

Programmer

TomCologne

Technical User

TheAceMan1

Programmer

PHV

MIS

TheAceMan1

Programmer

Similar threads

Part and Inventory Search

Sponsor