Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Search

Status
Not open for further replies.

transparent

Programmer
Joined
Sep 15, 2001
Messages
333
Location
GB

I want to write a stored procedure that I can pass in a list of names (comma seperated) and compare them to the list of names I store in my database.

So far I have a function which successfully creates a temporary table from a list of comma deliminated names (that the user has typed in)

So for example fred, ted, paul, carl -> tbllNames

My sql so far is:



I have a stored proceedure that runs the following sql:

SELECT dbo.Artists.ID, dbo.Artists.Name
FROM dbo.Artists

inner
join #tblNames
on
#tblNames.string like '%' + dbo.Artists.Name +'%'

This works great, but if the user types in a name i.e. harry as hary, I wont get a match.

So its not a great search.

I want to be able to match similar/mis-spelt names.

Is there a way in which I can achieve this level of functionality?
 
You'll have to write in the error checking your self.

I'd recommend a table with a list of incorrectly spelled names and the correct spelling of the name. Something like this.

Hary, Harry
Harie, Harry
Harrie, Harry
...

Then turn your code into something like this.
Code:
update a
set #tblNames.String = CorrectSpelling.CorrectName
from #tblNames a
join CorrectSpelling on a.String = CorrectSpelling.IncorrectName

SELECT     dbo.Artists.ID, dbo.Artists.Name
FROM         dbo.Artists 

inner
  join #tblNames
    on 
   #tblNames.string  like '%' + dbo.Artists.Name +'%'

Denny
MCSA (2003) / MCDBA (SQL 2000)

--Anything is possible. All it takes is a little research. (Me)

[noevil]
(Not quite so old any more.)
 
Have you looked at the soundex or difference functions

either one should work.

Rob
 
Second thought probably won't as you are looking inside a string.

Unless it was the only entry in a field soundex and difference probably won't help


Rob
 
You could take just the first three characters of the input string as the search criteria rather than then entire string.

Dale
 
You could try the Levenshtein distance. I found a SQL Server UDF that will calculate the difference for you. I tried it myself and it works pretty well. Here's the link...


I suspect that calculating the Levenshtein distance may be a bit slow, but it can certainly help clean up data. I found it useful, maybe you will too.



-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
select difference('Stuart','Stewart')

= 4 which is a good match

select difference('Smith','Smyth')
= 4 good match

select difference('Rob','Bob')
= 3 Close

select difference('Seattle','Washington')
= 0 no even close
 
I have actually implemented this in C#.

I may use the sql version if its faster.

Will do some tests.

Cheers for everybodies thoughts
 
My apologies. I did find a Levenshtein UDF on the internet, but I accidentally pointed you to the wrong one. Unfortunately, I cannot seem to find the one I used, so I cannot give credit where credit is due. Here is the one I found and implemented.

Code:
CREATE FUNCTION Levenshtein_distance(@s1 nvarchar(3999), @s2 nvarchar(3999))
RETURNS int
AS
BEGIN
  DECLARE @s1_len int, @s2_len int, @i int, @j int, @s1_char nchar, @c int, @c_temp int,
    @cv0 varbinary(8000), @cv1 varbinary(8000)
  SELECT @s1_len = LEN(@s1), @s2_len = LEN(@s2), @cv1 = 0x0000, @j = 1, @i = 1, @c = 0
  WHILE @j <= @s2_len
    SELECT @cv1 = @cv1 + CAST(@j AS binary(2)), @j = @j + 1
  WHILE @i <= @s1_len
  BEGIN
    SELECT @s1_char = SUBSTRING(@s1, @i, 1), @c = @i, @cv0 = CAST(@i AS binary(2)), @j = 1
    WHILE @j <= @s2_len
    BEGIN
      SET @c = @c + 1
      SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j-1, 2) AS int) +
        CASE WHEN @s1_char = SUBSTRING(@s2, @j, 1) THEN 0 ELSE 1 END
      IF @c > @c_temp SET @c = @c_temp
      SET @c_temp = CAST(SUBSTRING(@cv1, @j+@j+1, 2) AS int)+1
      IF @c > @c_temp SET @c = @c_temp
      SELECT @cv0 = @cv0 + CAST(@c AS binary(2)), @j = @j + 1
    END
    SELECT @cv1 = @cv0, @i = @i + 1
  END
  RETURN @c
END

I hope you find it useful.

-George

Strong and bitter words indicate a weak cause. - Fortune cookie wisdom
 
Cheers it works well although is a little slow.

I implemented this in C# and it seems a lot faster.

I'm going to run a series of benchmark tests and compare my results.

Will keep you informed


 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top