Identifying Duplicates (Not the usual)

Rozza · Aug 26, 2003

Hi,

I use some software to identify duplicate records based on inital and surname for people who live at the same address.
There is a field in the database called dup_code which stores the identifying value e.g if two records were identified as duplicates they would both be given a dup code of 1. The next preceeding duplicate records would then be given a dup_code of two and so on.
The problem is this: for example the software identifies S Andrews, Sarah Andrews, Sharon Andrews as duplicates because they all have the same initial. What I want to be able to do in this scenario is say that none of them are duplicates as we cannot determine whether S Andrews is the same as Sarah or Sharon. On the other hand if the following scenario arose where S Andrews and Sarah Andrews were marked as duplicates you could assume that these are duplicates and can be left as so. In the first scenario I just want to update the dup_code to 0 as 0 means that the records are not duplicates.
The software automatically detects that Sharon Andrews and Sharon Andrews are duplicates and so there is nothing more I need to do with scenario's such as this.

The table which stores the customers name and addresses is called customers, the field names concerned are:

Customers.first_name
Customers.last_name
Customers.dup_code

Can anyone shed any light on this?

Cheers

Paul

SQLSister · Aug 26, 2003

I'm not sure exactly what you want from us here, but I think your whole method of identifying duplicates is flawed. Poeple living at the same address can often have the same name, my brother and my father for instance, had the same name and for at least part of their lives lived at the same address. You need some other field such as date of birth to truly identify duplicate names.

Rozza · Aug 26, 2003

Granted that it won't be 100% accurate, but the customers who supply the data have numerous duplicate records in their data so I need to check for this.

I'll try and be more descriptive about what I want, here goes:

The software that I use actually identifies the duplicates for me based on Initial, Surname, Property and Postcode so that I don't have to worry about doing that. The way that it identifies duplicate records is by entering a value in a dup_code field. The problem is that it's not that clever when you have for example the following records all at the same address:

S Andrews
Sarah Andrews
Sharon Andrews

The software will mark all these records as duplicates because the first letter of the "Forename" field is "S". The software identifies the records as duplicates in the following way:

Forename Surname Dup_Code

S Andrews 1
Sarah Andrews 1
Sharon Andrews 1
Paul Rowling 2
P Rowling 2
Ozzie Osborne 0

In this example any record that does not have a duplicate record is always given a dup_code of 0. All other records are given a unique dup_code which will match with an associated duplicate record.
Because there is no way to identify if S Andrews is a duplicate with Sarah or Sharon, I want to update their dup_code to 0, but leave all others intact.
Does this make sense now........my head hurts!!!

You may ask why I want to do this? Well, the DB that I have stores customer info in a customer table and transactions (such as what the customer bought, when they bought it and how much) in a transactions table. Therefore I want to be able to identify duplicate customer records, transfer the transactions of the duplicate person I want to delete to the person I want to keep and then delete duplicate person. In the instance of the "Andrews" example above, I don't know which person to transfer the transactions to and so because I cannot do this accurately I will just take a hit and leave them as they are.

Cheers

Paul

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Identifying Duplicates (Not the usual)

Rozza

Technical User

SQLSister

Programmer

Rozza

Technical User

Similar threads

Part and Inventory Search

Sponsor