Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do I use fuzzy logic or soundex to scrutenise a single table?

Status
Not open for further replies.

eo

MIS
Joined
Apr 3, 2003
Messages
809
I have a table of customer names & surnames. The business requirement is for me to highlight from those two columns any possible matches via a report, that the users can then go an investigate.

But I am not sure how to use soundex or fuzzy logic to highlight such potential matches.

Any advise?

EO
Hertfordshire, England
 
SELECT name, surname
FROM Table1
WHERE name LIKE '%Paul%'
OR surname LIKE '%Andrews%'

The % symbol is the wild card, so in the example above it will return any results that contain the name paul or surname andrews i.e. mrpaul OR pauls OR mrpauls.

If you want to return results that only start with paul you could remove the first % - 'Paul%'

OR

WHERE left(name,4) = 'Paul'
would give the same result i.e. anything starting with paul

/Nice
 
Look at:
and/or

You will probably want to look more to the difference function instead of the soundex by itself.

=======================================
People think it must be fun to be a super genius, but they don't realize how hard it is to put up with all the idiots in the world. (Calvin from Calvin And Hobbs)

Robert L. Johnson III
CCNA, CCDA, MCSA, CNA, Net+, A+, CHDP
VB.NET Programmer
 
Or if you want to use fuzzy logic, then create an SSIS package and use the fussy grouping task in the dataflow to identify potential matches and place them in a table for examination. This will help you find closer matches such as Simpson and Simpsn. You can specify how close a match you want to find. It concerns me that you are only looking for name matches as names as not unique. You would have better results if you could add address or phone or email to the matching. Otherwise you may be investingating the millions of John Smiths in the US.

"NOTHING is more important in a database than integrity." ESquared
 
I opted for Fuzzy grouping in SSIS, seemed to be the best solution

EO
Hertfordshire, England
 
Turns out I cannot use fuzzy grouping in SSIS as the client only has SQL Sever 2008 Std!!

Back to the drawing board.

I have to work out how to report if a combination of Surname, Forename and Title would potentially indicate to a similar match.

This is not something I can hardcode such as the one person suggested above...

SELECT name, surname
FROM Table1
WHERE name LIKE '%Paul%'
OR surname LIKE '%Andrews%'

...as there are potentially millions of name, surnames!!

Any suggestions

EO
Hertfordshire, England
 
Have you considered SQL Server Full Text Indexing? Its somewhat complicated to set up, (I'm certainly not an expert on it), but possibly what you're looking for...

Here's an article that may be of some use

Rhys

"The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it"
Terry Pratchett
 
Just thought that with 'potentially millions of name, surnames' this may be a no go without some filter, (i.e., initial letter of surname), as you're likely to have to have a cartesian join between the table and itself matching each of the rows against every other row to compare them.

Rhys

"The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it"
Terry Pratchett
 
You could try something like the below if you can filter down;

Code:
SELECT 
	a.TABLE_ID_FIELD	AS 'ID'
	,a.Name				AS 'Name'
	,a.Surname			AS 'Surname'
	,b.TABLE_ID_FIELD	AS 'PossibleMatch_ID'
	,b.Name				AS 'PossibleMatch_Name'
	,b.Surname			AS 'PossibleMatch_Surname'
FROM
	dbo.TABLE_NAME a
JOIN
	dbo.TABLE_NAME b  ON a.TABLE_ID_FIELD <> b.TABLE_ID_FIELD
WHERE
(
	a.Surname LIKE 'a%'
	AND
	b.Surname LIKE 'a%'
)
AND
	DIFFERENCE(LTRIM(RTRIM(a.Name)), LTRIM(RTRIM(b.Name))) = 4
AND
	DIFFERENCE(LTRIM(RTRIM(a.Surname)), LTRIM(RTRIM(b.Surname))) = 4

Rhys

"The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it"
Terry Pratchett
 
Try number2

Is this what you were looking for?

This code goes off and finds names that are a fuzzy match across 2 columns.

Code:
DECLARE @temp TABLE (F1 VARCHAR(20), f2 VARCHAR(20))

INSERT INTO @temp VALUES('Paul','Andrews')
INSERT INTO @temp VALUES('Andrew','Paula')
INSERT INTO @temp VALUES('William','Wills')
INSERT INTO @temp VALUES('Ann','Ann')

SELECT F1 
FROM @temp
WHERE SOUNDEX(F1) IN (SELECT SOUNDEX(F2) FROM @temp)
 
Hey eo,

Did any of the suggestions work or did you find another solution? Would you mind posting the solution if you did?

/Nice
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top