Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations MikeeOK on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Delete similar records

Status
Not open for further replies.

pmccue

Instructor
Jul 29, 2006
18
How can I delete similar records in Access?
There's the info:
In a large table (~165k reocords) are many occurances of two or more records where the LastName, FirstName, and DOB fields all match, ie, these are not duplicate records where all fields match.
I want to delete similar records, leaving one unique record in the table (unique LastName, FirstName, DOB fields).
Would need an sql example for syntax as all my attempts using sql have so far failed.

IT4EVR suggested the following script, but it hangs when executed on my system:

DELETE *
FROM Table1 AS a
WHERE a.ID NOT IN(SELECT MIN(b.ID) FROM Table1 AS b WHERE a.LastName = b.LastName AND a.FirstName = b.FirstName AND a.DOB=b.DOB);

Is there another method of completing this task?

Thanks,
Phil....
 
I really don't see how that SQL you were given would work, but that does not mean it wouldn't, just that I don't see how. But here is what I would do. It will take a few steps.
1. Create Summary query with all matching LName, FName, and DOB. Include a count "Count(*)" as an additinal field, then set the criteria on this field to >1. This gives you a list of all records with "matches."
2. Using this query from #1 as your source, create a temporary table and fill it with these records from the query.
3. Add a temporary field to the table. Name it "DeleteMe"
4. For every record in the table that matches a row in the temporary table you built in step one, mark "DeleteMe" as True.
5. Delete all records in the table with "DeleteMe" marked as true. (This could be done in one step (3, 4 & 5) if you would like.)
6. Append all the records in your temporary table back into your "real" table.
7. Delete the temporary table.

HTH,
Vic
 
Phil,
I looked at your first thread and this time I did see how the SQL should work. I don't know how long you let it try to finish before you killed it, but I would think that it just needed too much time. The way I described should go rather quickly, as it really is taking small bites.

Vic
 
Vic,

I'm good to go and have competed through your step 3 (above). The temp table has 1995 records. Some records appear 2 times, some 3 ...up to 5 occurances. I then created DeleteMe as a text field, but how do I script or complete your Step 4?

Phil....
 
Why not simply duplicate the structure of your table, create an unique index on (LastName, FirstName, DOB) and then append the data ?

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Phil,

Here is the SQL that will do that. You will need to change tblOriginal and tblTemporary to your two table names, and the FName, LName, and DOB to match your field names.

Vic

UPDATE tblOriginal INNER JOIN tblTemporary ON (tblOriginal.DOB = tblTemporary.DOB) AND (tblOriginal.LName = tblTemporary.LName) AND (tblOriginal.FName = tblTemporary.FName) SET tblOriginal.DeleteMe = "True";
 
This looks good but in the above script you

SET tblOriginal.DeleteMe = "True"

But isn't the DeleteMe field created in tblTemporary table? This is what I did per your step #3. Please advise.

Phil....
 
PHV,

...not yet but I appreciate your continued monitoring of this effort.

Phil.....
 
Phil,

I'm sorry, I was not clear enough. The DeleteMe field would go into the Original table. That is where you want to delete records.

Vic
 
Vic,

OK, got the original table with all 'true' fields deleted. So this table is missing one record from each similar record set.

Here's the prob: The duplicates table (temporary table) creates 1 record for similar records only if I include just the fields in question: (LastName, FirstName, DOB). Access adds a count field that lists the number of similar fields found. Each record in teh temporary table is unique. If I try to include all fields into the temporary table for the later append, the count field goes away and I'm left with not a count of similar records, but all the duplicate records. Thus the append will fail as the structures of the original table and temporary table are different (39 fields vs 4 fields). I'm using Access Wizard for this effort.

Can you script how to do step #1 so I get one record as well as all fields in the temporary table to later append back into the original table?

Thanks,
Phil....
 
Phil,

Code:
SELECT tblOriginal.FName, 
tblOriginal.LName, 
tblOriginal.DOB, 
Min(tblOriginal.field1) AS field1, 
Min(tblOriginal.field2) AS field2, 
Min(tblOriginal.field3) AS field3
FROM tblOriginal
GROUP BY tblOriginal.FName, 
tblOriginal.LName, 
tblOriginal.DOB;

There is the code sample. The three fields that you are looking for matches will be in the "SELECT" statment at the top. Then be the only fields in the "Group By" section at the bottom. All the other fields in your table will be in the "SELECT" statement at the top. You can select then as Min(Minimum), or Max(Maximum), First, or Last. That will be your choice.

Vic
 
Vic,

Thanks for the latest script. I know something about Access but have had no formal training in in sql scripting, which I see is far more powerful and flexible than simply using Access built in wizards, etc.

One thing that's been a pain about this effort is that some of the records in the original table are not duplicates, where all fields values match, but are 'similar' where some field values match, like LastName, FirstName, DOB, but the other fields in the similar records may not match.

I tried your most recent script, and it worked fine, but let me restate the goal and see if you feel we are on the right track:

The original table has ~165K records. In the table there are some records (~1.5K) that have the same LastName, FirstName and DOB field values, but the remaining fields in each of the similar records have different values. That is to say, some records are distinguished as similar because 3 fields have the same values although the rest of the field values are different. There are cases where 2 records are similar, and cases where 3, 4 and up to 5 records have the same 3 field values (LastName, FirstName, DOB). I need to eliminate similar records where these 3 fields match saving, say, the first occurance of each matching record.

Rather than trying to eliminate all matching records in the original table as you suggest, then trying to append one occurance of the matching records back into the original table (which does not seem to be possible since the structures differ) maybe what might work is an approach where matching records are removed from the original table except for the first occurance Is this workable? If so, how?

Thanks for all your continued support,
Phil....
 
The two tables should match in structure except for the DeleteMe field in the original table. At which case, if you are appending records from the Temp table into the original table, (which is what you want), then by specifying all the fields in Temp in the query, it should work.

If you can't get that to work, I would suggest trying PHV's suggestion.
Why not simply duplicate the structure of your table, create an unique index on (LastName, FirstName, DOB) and then append the data ?

Before doing any of this, make a backup of the table in a safe place.
The steps in his suggestion (with slight modification) would be to
1. Make a copy of the original table structure and name it some sort of Temp.
2. Modify the Temp table structure to make the LName, FName and DOB fields the Primary Key for the table.
3. Copy all the records from the original table into the Temp table. (Because only one record can be added for matching LName, FName, and DOB, this will cause all "matching" records to not be copied.)
4. Delete all the records from the original table.
5. Copy all the records from the Temp table back into the original table. In the Append query, specify each field in the original table, do not use the "*" to signify All.
 
My code above works but because it's a subquery scanning 165,000 rows, it also jammed my system. Perhaps it is scanning all 165k rows, then doing it again for a match.

Interesting enough, my delete query was similar to one proposed in a SQL book by a respected author. I am not sure if this query runs as slow for other RDBMS's.

But this post serves to remind us all of the importance of selecting the right indexes BEFORE we enter or export any data into it. Too often people, including myself, are allowing duplicate records due to improper indexing.

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top