Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Better SoundEx implementation? (Very interesting!) 13

Status
Not open for further replies.

RebLazer

Programmer
Jun 7, 2002
438
US
I need to programmatically check if two names are really the same using SoundEx. The code below works - but I'm sure there's a better implementation out there. If you know of one, please let me in on it!

Thanks so much,
Lazer



The code in green calls the stored procedure in red:[tt]

declare @hold1 varchar(4)
declare @hold2 varchar(4)
exec soundexalpha 'robert', @hold1 out
exec soundexalpha 'raburt', @hold2 out
if (@hold1=@hold2) select 'Y'
else select 'N'


ALTER proc dbo.SoundexAlpha
@instring varchar(50),
@soundex varchar(50)=NULL output

/* Extended Soundex
Copied from 'The Guru's Guide to Transact-SQL'
by Ken Henderson pp42-45

Original Soundex algorithm patented by Russel & O'Dell in 1918

Translation to Transact-SQL by Ken Henderson 5/16/98

Example:
declare @soundex varchar(4)
exec SoundexAlpha 'Rodgers', @soundex OUT
@soundex will have the extended soundex of 'rodgers'

*/
as

declare @workstr varchar(10)

set @instring=upper(@instring)
--use @soundex as a work buffer (omitting 1st char)
set @soundex=right(@instring,len(@instring)-1)

--replace vowels with 'A'
set @workstr='EIOUY'
while (@workstr<>'') begin
set @soundex=replace(@soundex,left(@workstr,1),'A')
set @workstr=right(@workstr,len(@workstr)-1)
end

-- reaffix 1st char
set @soundex = left(@instring,1) + @soundex

--translate common prefixes
if (left(@soundex,3)='MAC') set @soundex='MCC'+right(@soundex,len(@soundex)-3)
if (left(@soundex,2)='KN') set @soundex='NN'+right(@soundex,len(@soundex)-2)
if (left(@soundex,1)='K') set @soundex='C'+right(@soundex,len(@soundex)-1)
if (left(@soundex,2)='PF') set @soundex='FF'+right(@soundex,len(@soundex)-2)
if (left(@soundex,3)='SCH') set @soundex='SSS'+right(@soundex,len(@soundex)-3)
if (left(@soundex,2)='PH') set @soundex='FF'+right(@soundex,len(@soundex)-2)

--remove 1st char again
set @instring=@soundex
set @soundex=right(@soundex,len(@soundex)-1)

--translate other infixes
set @soundex=Replace(@soundex,'DG','GG')
set @soundex=Replace(@soundex,'CAAN','TAAN')
set @soundex=Replace(@soundex,'D','T')
set @soundex=Replace(@soundex,'NST','NSS')
set @soundex=Replace(@soundex,'AV','AF')
set @soundex=Replace(@soundex,'Q','G')
set @soundex=Replace(@soundex,'Z','S')
set @soundex=Replace(@soundex,'M','N')
set @soundex=Replace(@soundex,'KN','NN')
set @soundex=Replace(@soundex,'K','C')

set @soundex=Replace(@soundex,'AHA','~~~')
set @soundex=Replace(@soundex,'H','A')
set @soundex=Replace(@soundex,'~~~','AHA')

set @soundex=Replace(@soundex,'AW','A')
set @soundex=Replace(@soundex,'PH','FF')
set @soundex=Replace(@soundex,'SCH','SSS')

--truncate ending A or S
if (right(@soundex,1)='A' or right(@soundex,1)='S')
set @soundex=left(@soundex,len(@soundex)-1)

--translate ending NT to TT
if(right(@soundex,2)='NT') set @soundex=
left(@soundex,len(@soundex)-2)+'TT'

--remove all A
set @soundex=replace(@soundex,'A','')

--reaffix 1st char
set @soundex=left(@instring,1)+@soundex

--remove repeating chars
declare @c int
declare @d varchar(2)

set @c=65
while (@c<91) begin
set @d=char(@c)+char(@c)
while(charindex(@d,@soundex)<>0)
set @soundex=replace(@soundex,@d,char(@c))
set @c=@c+1
end

--return 4 bytes (with space pad if necessary)
set @soundex=left(@soundex,4)
if (len(@soundex)<4) set @soundex=@soundex+space(4-len(@soundex))

return 0


GO
 
If you've never tried Soundex, give the above code a try - it's pretty cool.

But I need cooler code... :) :) :)

Thanks very much!
Lazer
 
SQL server 2000 has function SOUNDEX()
Can u use it ?

SELECT CASE WHEN SOUNDEX( 'robert' ) = SOUNDEX( 'raburt' ) THEN 'Y' ELSE 'N' END

Zhavic

---------------------------------------------------------------
In the 1960s you needed the power of two Comodore64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.
 
Or just try to look on algorytm in another programing languages:
For example:


Zhavic

---------------------------------------------------------------
In the 1960s you needed the power of two Comodore64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.
 
Zhavic,

I am aware of the built in SOUNDEX function - but I am also using &quot;extended Soundex&quot; - so that won't do...

Thank you very much!

Anyone know of any other T-SQL implimentations?

Thanks!
Lazer
 
Ok, than what you mean with 'better implementation' of this ?

You mean more efficient programed SOUNDEX function
or another algorytm, that brings better results ?

Zhavic

---------------------------------------------------------------
In the 1960s you needed the power of two Comodore64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.
 
Zhavic,

Thanks for your patience and persistence. I am looking for an implementation that produces better (i.e. more accurate) results.

With my current implementation this is a match:
Robert
and
Roaeouyiiooabaeriouyioet

That is pathetic…

I appreciate you (and anyone else's!) help very much!
Lazer
 
This algorythm may work for you:

1) Translate first characters of name: */
/* MAC => MCC KN => NN K => C */
/* PH => FF PF => FF SCH => SSS */
/* (2) Translate last characters of name: */
/* EE => Y IE => Y */
/* DT,RT,RD,NT,ND => D */
/* (3) First character of key = first character of name. */
/* (4) Translate remaining characters by following rules, */
/* incrementing by one character each time: */
/* a. EV => AF else A,E,I,O,U => A */
/* b. Q => G Z => S M => N */
/* c. KN => N else K => C */
/* d. SCH => SSS PH => FF */
/* e. H => If previous or next is nonvowel, previous */
/* f. W => If previous is vowel, previous */
/* Add current to key if current <> last key character */
/* (5) If last character is S, remove it */
/* (6) If last characters are AY, replace with Y */
/* (7) If last character is A, remove it

-----------------------------------------------------
It is on

Example:
for 'Robert' you take 'RABAD'
and for ' Roaeouyiiooabaeriouyioet' you take 'RAYABARAYAT'

( but I get this results with pen and paper, may be they are wrong )

I will write this to T-SQL and post it tomorrow ( now is 23:46 there )
You can than try it, and if it will not be good, I will look for another one
( I am programmer, and I like to program anything )

Zhavic


---------------------------------------------------------------
In the 1960s you needed the power of two Comodore64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.
 
Zhavic,

Wow, your efforts are just great! (A star for you!) I will certainly look forward to trying out your code!

Thank you very much!
Lazer

By the way, what country are you in?
 
Hi Lazer, I am back from sleeping and typing some code.
Thanks for star, I am glad I can help
I am from Slovakia ( in the middle of Europa )

There is very good article that shows some of SOUNDEX codding rulles:

This is that algorithm from my previous post,
its name is 'New York State Identification and Intelligence System (NYSIIS) Phonetic Encoder'

---------------------------------------------------
Code:
CREATE FUNCTION dbo.SoundexAplhaFunction( @instring VARCHAR(50) )
RETURNS VARCHAR(50)
AS
BEGIN

    DECLARE @cKey VARCHAR(50),
            @cChars VARCHAR(3),
            @cResult VARCHAR(3),
            @cVowels VARCHAR(10)

    DECLARE @i integer

    /* vowels */
    SELECT @cVowels = 'AEIOU'

    /* trim all spaces */
    SELECT @instring = REPLACE( @instring, ' ', '' )
    SELECT @instring = UPPER( @instring )


    /* (1)  Translate first characters of name:       */
    /*      MAC => MCC     KN  => NN       K   => C   */
    /*      PH  => FF      PF  => FF       SCH => SSS */
    SELECT @instring = CASE
                            WHEN LEFT( @instring, 3 ) = 'MAC' THEN 'MCC' + SUBSTRING( @instring, 4, LEN( @instring ) )
                            WHEN LEFT( @instring, 2 ) = 'KN' THEN 'NN' + SUBSTRING( @instring, 3, LEN( @instring ) )
                            WHEN LEFT( @instring, 1 ) = 'K' THEN 'C' + SUBSTRING( @instring, 2, LEN( @instring ) )
                            WHEN LEFT( @instring, 2 ) = 'PH' THEN 'FF' + SUBSTRING( @instring, 3, LEN( @instring ) )
                            WHEN LEFT( @instring, 2 ) = 'PF' THEN 'FF' + SUBSTRING( @instring, 3, LEN( @instring ) )
                            WHEN LEFT( @instring, 3 ) = 'SCH' THEN 'SSS' + SUBSTRING( @instring, 4, LEN( @instring ) )
                            ELSE @instring    -- do nothing
                       END

    /* (2)  Translate last characters of name: */
    /*      EE  => Y       IE  => Y          */
    /*      DT,RT,RD,NT,ND => D                */
    SELECT @instring = CASE
                            WHEN RIGHT( @instring, 2 ) IN ( 'EE', 'IE' ) THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'Y'
                            WHEN RIGHT( @instring, 2 ) IN ( 'DT', 'RT', 'RD', 'NT', 'ND' ) THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'D'
                            ELSE @instring    -- do nothing
                       END


    /* (3)  First character of key = first character of name. */
    SELECT @cKey = LEFT( @instring, 1 )

    /* (4)  Translate remaining characters by following rules,            */
    /*      incrementing by one character each time:                      */
    /*      a.  EV => AF else A,E,I,O,U => A                              */
    /*      b.  Q => G    Z => S    M => N                                */
    /*      c.  KN => N  else K => C                                      */
    /*      d.  SCH => SSS    PH => FF                                    */
    /*      e.  H => If previous or next is nonvowel, previous            */
    /*      f.  W => If previous is vowel, previous                       */
    /*      Add current to key if current <> last key character           */
    SELECT @i = 2
    /* while not end of @instring */
    WHILE SUBSTRING( @instring, @i, 1 ) > ''
    BEGIN
        SELECT @cChars = SUBSTRING( @instring, @i, 3 )

        SELECT @cResult = CASE /*EV => AF */
                               WHEN LEFT( @cChars, 2 ) = 'EV' THEN 'AF'
                               /* A,E,I,O,U => A */
                               WHEN CHARINDEX( LEFT( @cChars, 1 ), @cVowels ) > 0 THEN 'A'
                               /* Q => G */
                               WHEN LEFT( @cChars, 1 ) = 'Q' THEN 'G'
                               /* Z => S */
                               WHEN LEFT( @cChars, 1 ) = 'Z' THEN 'S'
                               /* M => N */
                               WHEN LEFT( @cChars, 1 ) = 'M' THEN 'N'
                               /* KN => N */
                               WHEN LEFT( @cChars, 2 ) = 'KN' THEN 'N'
                               /* K => C */
                               WHEN LEFT( @cChars, 1 ) = 'K' THEN 'C'
                               /* SCH => SSS */
                               WHEN LEFT( @cChars, 3 ) = 'SCH' THEN 'SSS'
                               /* PH => FF */
                               WHEN LEFT( @cChars, 3 ) = 'PH' THEN 'FF'
                               /* H => If previous or next is nonvowel */
                                 WHEN LEFT( @cChars, 1 ) = 'H' AND ( CHARINDEX( SUBSTRING( @instring, @i - 1, 1 ), @cVowels ) = 0 OR ( @i < LEN( @instring ) AND CHARINDEX( SUBSTRING( @instring, @i + 1, 1 ), @cVowels ) = 0 ) ) THEN SUBSTRING( @instring, @i - 1, 1 )
                               /* W => If previous is vowel, previous */
                                 WHEN LEFT( @cChars, 1 ) = 'W' AND CHARINDEX( SUBSTRING( @instring, @i - 1, 1 ), @cVowels ) > 0 THEN SUBSTRING( @instring, @i - 1, 1 )
                               /* original char */
                               ELSE LEFT( @cChars, 1 )
                         END

        SELECT @instring = STUFF( @instring, @i, LEN( @cResult ), @cResult )

        /* Add current to key if current <> last key character */
        IF RIGHT( @cKey, 1 ) != LEFT( @cResult, 1 )
            SELECT @cKey = @cKey + @cResult

        SELECT @i = @i + 1

    END

    /* (5)  If last character is S, remove it                             */
    IF RIGHT( @cKey, 1 ) = 'S'
        SELECT @cKey = LEFT( @cKey, LEN( @cKey ) - 1 )

    /* (6)  If last characters are AY, replace with Y                     */
    IF RIGHT( @cKey, 2 ) = 'AY'
        SELECT @cKey = LEFT( @cKey, LEN( @cKey ) - 2 )
    
    /* (7)  If last character is A, remove it            */
    IF RIGHT( @cKey, 1 ) = 'A'
        SELECT @cKey = LEFT( @cKey, LEN( @cKey ) - 1 )


	RETURN @cKey
END

------------------------------------------------------

Try:
SELECT dbo.SoundexAplhaFunction( 'Robert' ),
dbo.SoundexAplhaFunction( 'Roaeouyiiooabaeriouyioet' )

---------------------------------------------

I can also program some other algorithm for example from link I posted here, just try this and post back results.


If you try this with 'Roger' and 'Rodger', you get different results, but some of codes from the link above returns the same results for this names, and also this algorithm:
I thing it is modified NYSSIS algorithm, but it also returns the same resulst for 'Roger' and 'Rodger'
I post it there too.

Zhavic


---------------------------------------------------------------
In the 1960s you needed the power of two Comodore64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.
 
There is algorithm from

( 1) remove all 'S' and 'Z' chars from the end of the surname
( 2) transcode initial strings
MAC => MC
PF => F
( 3) Transcode trailing strings as follows,
IX => IC
EX => EC
YE,EE,IE => Y
NT,ND => D
( 4) transcode 'EV' to 'EF' if not at start of name
( 5) use first character of name as first character of key
( 6) remove any 'W' that follows a vowel
( 7) replace all vowels with 'A'
( 8) transcode 'GHT' to 'GT'
( 9) transcode 'DG' to 'G'
(10) transcode 'PH' to 'F'
(11) if not first character, eliminate all 'H' preceded or followed by a vowel
(12) change 'KN' to 'N', else 'K' to 'C'
(13) if not first character, change 'M' to 'N'
(14) if not first character, change 'Q' to 'G'
(15) transcode 'SH' to 'S'
(16) transcode 'SCH' to 'S'
(17) transcode 'YW' to 'Y'
(18) if not first or last character, change 'Y' to 'A'
(19) transcode 'WR' to 'R'
(20) if not first character, change 'Z' to 'S'
(21) transcode terminal 'AY' to 'Y'
(22) remove traling vowels
(23) collapse all strings of repeated characters
(24) if first char of original surname was a vowel, append it to the code


Code:
CREATE FUNCTION dbo.SoundexAplhaFunction_1 ( @instring VARCHAR(50) )
RETURNS VARCHAR(50)
AS
BEGIN

	DECLARE @cKey VARCHAR(50),
			@cChar VARCHAR(3),
			@cVowels VARCHAR(10),
			@cFirst_char CHAR(1)

	DECLARE @i integer

	/* vowels */
	SELECT @cVowels = 'AEIOU'

	/* trim all spaces */
	SELECT @instring = REPLACE( @instring, ' ', '' )
	SELECT @instring = UPPER( @instring )

	/* save first char */
	SELECT @cFirst_char = LEFT( @instring, 1 )

	/* ( 1) remove all 'S' and 'Z' chars from the end of the surname */
	SELECT @i = LEN( @instring )
	WHILE SUBSTRING( @instring, @i, 1 ) IN ( 'S', 'Z' )
		SELECT @i = @i - 1
	SELECT @instring = LEFT( @instring, @i )

 	/* ( 2) transcode initial strings */
	/*      MAC => MC                 */
	/*      PF => F                     */
	SELECT @instring = CASE 
							WHEN LEFT( @instring, 3 ) = 'MAC' THEN 'MC' + SUBSTRING( @instring, 3, LEN( @instring ) )
							WHEN LEFT( @instring, 2 ) = 'PF' THEN 'F' + SUBSTRING( @instring, 3, LEN( @instring ) )
							ELSE @instring	-- do nothing
					   END

	/* ( 3) Transcode trailing strings as follows */
	/*      IX       => IC                        */
	/*      EX       => EC                        */
	/*      YE,EE,IE => Y                         */
	/*      NT,ND    => D                         */
	SELECT @instring = CASE 
							WHEN RIGHT( @instring, 2 ) = 'IX' THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'IC'
							WHEN RIGHT( @instring, 2 ) = 'EX' THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'EC'
							WHEN RIGHT( @instring, 2 ) IN ( 'YE', 'EE', 'IE' ) THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'Y'
							WHEN RIGHT( @instring, 2 ) IN ( 'NT', 'ND' ) THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'D'
							ELSE @instring	-- do nothing
					   END

	/* ( 4) transcode 'EV' to 'EF' if not at start of name */
	SELECT @instring = LEFT( @instring, 1 ) + REPLACE( SUBSTRING( @instring, 2, LEN( @instring ) ), 'EV', 'EF' )

	/* ( 5) use first character of name as first character of key */
	SELECT @cKey = LEFT( @instring, 1 )

	/* ( 6) remove any 'W' that follows a vowel */
	SELECT @instring = REPLACE( @instring, 'AW', 'A ' )
	SELECT @instring = REPLACE( @instring, 'EW', 'E ' )
	SELECT @instring = REPLACE( @instring, 'IW', 'I ' )
	SELECT @instring = REPLACE( @instring, 'OW', 'O ' )
	SELECT @instring = REPLACE( @instring, 'UW', 'U ' )
	SELECT @instring = REPLACE( @instring, ' ', '' )

	/* ( 7) replace all vowels with 'A' */
	SELECT @instring = REPLACE( @instring, 'E', ' ' )
	SELECT @instring = REPLACE( @instring, 'I', ' ' )
	SELECT @instring = REPLACE( @instring, 'O', ' ' )
	SELECT @instring = REPLACE( @instring, 'U', ' ' )
	SELECT @instring = REPLACE( @instring, ' ', 'A' )

	/* ( 8) transcode 'GHT' to 'GT' */
	SELECT @instring = REPLACE( @instring, 'GHT', 'GT' )

	/* ( 9) transcode 'DG' to 'G' */
	SELECT @instring = REPLACE( @instring, 'DG', 'G' )

	/* (10) transcode 'PH' to 'F' */
	SELECT @instring = REPLACE( @instring, 'PH', 'F' )

	/* (11) if not first character, eliminate all 'H' preceded or followed by a vowel */
	SELECT @i = 2
	WHILE @i <= LEN( @instring )
	BEGIN
		IF SUBSTRING( @instring, @i, 1 ) = 'H' AND 	( CHARINDEX( SUBSTRING( @instring, @i - 1, 1 ), @cVowels ) > 0 OR  ( @i < LEN( @instring ) AND CHARINDEX( SUBSTRING( @instring, @i + 1, 1 ), @cVowels ) > 0 ) )
			/* replace 'H' with ' ' */
			SELECT @instring = STUFF( @instring, @i, 1, ' ' )
		SELECT @i = @i + 1
	END
	SELECT @instring = REPLACE( @instring, ' ', '' )

	/* (12) change 'KN' to 'N', else 'K' to 'C' */
	SELECT @instring = REPLACE( @instring, 'KN', 'N' )
	SELECT @instring = REPLACE( @instring, 'K', 'C' )

	/* (13) if not first character, change 'M' to 'N' */
	SELECT @instring = LEFT( @instring, 1 ) + REPLACE( SUBSTRING( @instring, 2, LEN( @instring ) ), 'M', 'N' )

	/* (14) if not first character, change 'Q' to 'G' */
	SELECT @instring = LEFT( @instring, 1 ) + REPLACE( SUBSTRING( @instring, 2, LEN( @instring ) ), 'Q', 'G' )

	/* (15) transcode 'SH' to 'S' */
	SELECT @instring = REPLACE( @instring, 'SH', 'S' )

	/* (16) transcode 'SCH' to 'S' */
	SELECT @instring = REPLACE( @instring, 'SCH', 'S' )

	/* (17) transcode 'YW' to 'Y' */
	SELECT @instring = REPLACE( @instring, 'YW', 'Y' )

	/* (18) if not first or last character, change 'Y' to 'A' */
	IF LEN( @instring ) > 2
		SELECT @instring = LEFT( @instring, 1 ) + REPLACE( SUBSTRING( @instring, 2, LEN( @instring ) - 2 ), 'Y', 'A' ) + RIGHT( @instring, 1 )
	
	/* (19) transcode 'WR' to 'R' */
	SELECT @instring = REPLACE( @instring, 'WR', 'R' )

	/* (20) if not first character, change 'Z' to 'S' */
	SELECT @instring = LEFT( @instring, 1 ) + REPLACE( SUBSTRING( @instring, 2, LEN( @instring ) ), 'Z', 'S' )

	/* (21) transcode terminal 'AY' to 'Y' */
	IF RIGHT( @instring, 2 ) = 'AY'
		SELECT @instring = LEFT( @instring, LEN( @instring ) - 2 ) + 'Y'

	/* (22) remove traling vowels */
	/*      start vowels */
	SELECT @i = 1
	WHILE CHARINDEX( SUBSTRING( @instring, @i, 1 ), @cVowels ) > 0
		/* replace vowels with spaces */
		SELECT @instring = STUFF( @instring, @i, 1, ' ' ),
			   @i = @i + 1

	/*     end vowels */
	SELECT @i = LEN( @instring )
	WHILE CHARINDEX( SUBSTRING( @instring, @i, 1 ), @cVowels ) > 0
		/* replace vowels with spaces */
		SELECT @instring = STUFF( @instring, @i, 1, ' ' ),
			   @i = @i - 1
	/*     remove spaces */
	SELECT @instring = REPLACE( @instring, ' ', '' )

	/* add to key */
	SELECT @cKey = @cKey + @instring

	/* (23) collapse all strings of repeated characters */
	SELECT @cChar = LEFT( @cKey, 1 ),
		   @i = 2
	WHILE @i <= LEN( @cKey )
	BEGIN
		IF SUBSTRING( @cKey, @i, 1 ) = @cChar
			SELECT @cKey = STUFF( @cKey, @i, 1, ' ' )
		ELSE
			SELECT @cChar = SUBSTRING( @cKey, @i, 1 )

		SELECT @i = @i + 1
	END
	SELECT @cKey = REPLACE( @cKey, ' ', '' )

	/* (24) if first char of original surname was a vowel, append it to the code */
	IF CHARINDEX( @cFirst_char, @cVowels ) > 0
		SELECT @cKey = @cKey + @cFirst_char


	RETURN @cKey

END

---------------------------------------
Try
SELECT dbo.SoundexAplhaFunction_1( 'Rogers' ),
dbo.SoundexAplhaFunction_1( 'Rodgers' )


Zhavic

---------------------------------------------------------------
In the 1960s you needed the power of two Comodore64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.
 
interesting subject -> if you didn't know what soundex was (like me) try this link for a quick intro:
Let us know what algorithm you opt for in the end.

Posting code? Wrap it with code tags: [ignore]
Code:
[/ignore][code]CodeHere
[ignore][/code][/ignore].
 
Zhavic,

I have a complaint: Tek-Tips won't let me give you multiple stars on this thread - I just can't believe how incredibly helpful you have been. It is truly unbelievable!

I'll give you my own:
stars.gif
stars.gif


I've been testing out your two functions that you posted.
These are some of the cases I have tested - all of these are supposed to be matches (i.e. &quot;YES&quot;). In each of the examples, below, one function marked the match correctly, and the other marked it incorrectly. I have noted the incorrect functions below, in two colors for clarity:
rodger - roger SoundexAplhaFunction marked it &quot;NO&quot;
phone - fone SoundexAplhaFunction_1 marked it &quot;NO&quot;
fillup - phillup SoundexAplhaFunction_1 marked it &quot;NO&quot;
car - kar SoundexAplhaFunction_1 marked it &quot;NO&quot;
cohn - cohen SoundexAplhaFunction marked it &quot;NO&quot;
shira - sheeruh SoundexAplhaFunction marked it &quot;NO&quot;
steven - stephen SoundexAplhaFunction marked it &quot;NO&quot;

From these examples, I have concluded that [tt]SoundexAplhaFunction[/tt] is better at matching beginnings of words, whereas [tt]SoundexAplhaFunction_1[/tt] is better at matching things in the middle of words.

You have been so very kind already... Would you be able to combine your two procedures somehow? If so, I think the combination would be very, very accurate and powerful: the ability of [tt]SoundexAplhaFunction[/tt] to match the beginning of words, together with the quality of matches that [tt]SoundexAplhaFunction_1[/tt] does for things in the middle of words. It would be excellent.

By the way, in the application that I am using this for, it is important to avoid false positives - in other words, it's better to mistakenly categorize an actual match as a &quot;NO&quot;, than to falsely match two words with a &quot;YES&quot;.


Thank you very, very much!!!
:)
Lazer
Chicago
 
Zhavic,

I certainly hope I didn't ask too much of you when I asked you if you would be able to combine the two functions together...

If I offended you, I am sorry! [neutral] I just really think that you are on to something that will really work very well!

I appreciate it very much,
Lazer
 
At first,
Wooooow, thats are the biggest stars I have ever seen on this site !

Thanks for them, Lazer

At second,
that's OK, feel free to ask what you want,
The delay in my posts is often caused by my problems with english - it's funny - I want to post something, but don't now write it in english.

I have looked at some sites for this, but I don't know to combine this two algorithms.
This two algorithms creates it's own
result codes and they are as they was designed.
Any changes can violate they consistency.
You can try to find some informations about this and others alorithms on ointernet, and if you find some, just post there the algorithm and I can write it in T-SQL
( my english is no good and I have problems with reading info about this - I am using dictionary a lot :) )

As you sad:
it's better to mistakenly categorize an actual match as a &quot;NO&quot;, than to falsely match two words with a &quot;YES&quot;.

May be, you are using it this way, but if no, than use both of this algorithms to check if two names are tha same, and only if they both returns yes, that categorize an actual match as Yes, otherwise No.

Zhavic

---------------------------------------------------------------
In the 1960s you needed the power of two Comodore64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.
 
Uh-oh! Zhavic - I was hoping you could get me through this one... This isn't good!

Does anyone else know how to combine the two functions (as I described in my green and blue post above with the huge stars)?

Really [tt]SoundexAplhaFunction_1[/tt] (also above) works great - except for the very beginning of words, for which [tt]SoundexAplhaFunction[/tt] works much better...

Anyone know how to combine these? Please???

Thanks very much! This is so close to being a perfect solution...
Lazer
 
Lazer,

have some positive news for you. :)
I found mistake in 'SoundexAplhaFunction_1' in transcoding 'PH' to 'F' in step 10
/* (10) transcode 'PH' to 'F' */
SELECT @instring = REPLACE( @instring, 'PH', 'F' )

This solve problem with
phone - fone and fillup - phillup

and also in step 12 for 'K' -> 'C'
SELECT @instring = REPLACE( @instring, 'K', 'C' )

And this solve problem with
car - kar

I post you repaired 'SoundexAplhaFunction_1' tomorow ( or today ? - there is Wensday 01:45 AM ), now my head refuse to
work.

'See' you later.

Zhavic

---------------------------------------------------------------
In the 1960s you needed the power of two Comodore64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.
 
I can't sleep ( 03:32 AM :) ), this is a final beta version 1.0

CREATE FUNCTION dbo.SoundexAplhaFunction_1 ( @instring VARCHAR(50) )
RETURNS VARCHAR(50)
AS
BEGIN

DECLARE @cKey VARCHAR(50),
@cChar VARCHAR(3),
@cChars VARCHAR(3),
@cVowels VARCHAR(10),
@cFirst_char CHAR(1),
@cResult VARCHAR(10)

DECLARE @i integer

/* vowels */
SELECT @cVowels = 'AEIOU'

/* trim all spaces */
SELECT @instring = REPLACE( @instring, ' ', '' )
SELECT @instring = UPPER( @instring )

/* save first char */
SELECT @cFirst_char = LEFT( @instring, 1 )

/* ( 1) remove all 'S' and 'Z' chars from the end of the surname */
SELECT @i = LEN( @instring )
WHILE SUBSTRING( @instring, @i, 1 ) IN ( 'S', 'Z' )
SELECT @i = @i - 1
SELECT @instring = LEFT( @instring, @i )

/* ( 2) transcode initial strings */
/* MAC => MC */
/* PF => F */
SELECT @instring = CASE
WHEN LEFT( @instring, 3 ) = 'MAC' THEN 'MC' + SUBSTRING( @instring, 3, LEN( @instring ) )
WHEN LEFT( @instring, 2 ) = 'PF' THEN 'F' + SUBSTRING( @instring, 3, LEN( @instring ) )
ELSE @instring -- do nothing
END

/* ( 3) Transcode trailing strings as follows */
/* IX => IC */
/* EX => EC */
/* YE,EE,IE => Y */
/* NT,ND => D */
SELECT @instring = CASE
WHEN RIGHT( @instring, 2 ) = 'IX' THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'IC'
WHEN RIGHT( @instring, 2 ) = 'EX' THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'EC'
WHEN RIGHT( @instring, 2 ) IN ( 'YE', 'EE', 'IE' ) THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'Y'
WHEN RIGHT( @instring, 2 ) IN ( 'NT', 'ND' ) THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'D'
ELSE @instring -- do nothing
END

/* the step ( 4) I moved to begining of WHILE ... END below */

/* ( 5) use first character of name as first character of key */
/* SELECT @cKey = LEFT( @instring, 1 ) */
/* don't now, what they thing with this, but with @cKey = '' it seems to be working */
SELECT @cKey = ''


SELECT @i = 1
/* while not end of @instring */
WHILE SUBSTRING( @instring, @i, 1 ) > ''
BEGIN
SELECT @cChars = SUBSTRING( @instring, @i, 3 )

SELECT @cResult = CASE /* ( 4) transcode 'EV' to 'EF' if not at start of name */
WHEN @i > 1 AND LEFT( @cChars, 2 ) = 'EV' THEN 'AF'
/* ( 6) remove any 'W' that follows a vowel */
WHEN LEFT( @cChars, 1 ) = 'W' AND CHARINDEX( SUBSTRING( @instring, @i - 1, 1 ), @cVowels ) > 0 THEN SUBSTRING( @instring, @i - 1, 1 )
/* ( 7) replace all vowels with 'A' */
WHEN CHARINDEX( LEFT( @cChars, 1 ), @cVowels ) > 0 THEN 'A'
/* ( 8) transcode 'GHT' to 'GT' */
WHEN LEFT( @cChars, 2 ) = 'GHT' THEN 'GGG'
/* ( 9) transcode 'DG' to 'G' */
WHEN LEFT( @cChars, 2 ) = 'DG' THEN 'G'
/* (10) transcode 'PH' to 'F' */
WHEN LEFT( @cChars, 2 ) = 'PH' THEN 'F'
/* (11) if not first character, eliminate all 'H' preceded or followed by a vowel */
WHEN LEFT( @cChars, 1 ) = 'H' AND @i > 1 AND ( CHARINDEX( SUBSTRING( @instring, @i - 1, 1 ), @cVowels ) > 0 OR CHARINDEX( SUBSTRING( @instring, @i + 1, 1 ), @cVowels ) > 0 ) THEN SUBSTRING( @instring, @i - 1, 1 )
/* (12) change 'KN' to 'N', else 'K' to 'C' */
WHEN LEFT( @cChars, 2 ) = 'KN' THEN 'N'
WHEN LEFT( @cChars, 1 ) = 'K' THEN 'C'
/* (13) if not first character, change 'M' to 'N' */
WHEN @i > 1 AND LEFT( @cChars, 1 ) = 'M' THEN 'N'
/* (14) if not first character, change 'Q' to 'G' */
WHEN @i > 1 AND LEFT( @cChars, 1 ) = 'Q' THEN 'G'
/* (15) transcode 'SH' to 'S' */
WHEN LEFT( @cChars, 2 ) = 'SH' THEN 'S'
/* (16) transcode 'SCH' to 'S' */
WHEN @cChars = 'SCH' THEN 'SSS'
/* (17) transcode 'YW' to 'Y' */
WHEN LEFT( @cChars, 2 ) = 'YW' THEN 'Y'
/* (18) if not first or last character, change 'Y' to 'A' */
WHEN @i > 1 AND @i < LEN( @instring ) AND LEFT( @cChars, 1 ) = 'Y' THEN 'A'
/* (19) transcode 'WR' to 'R' */
WHEN LEFT( @cChars, 2 ) = 'WR' THEN 'R'
/* (20) if not first character, change 'Z' to 'S' */
WHEN @i > 1 AND LEFT( @cChars, 1 ) = 'Z' THEN 'S'
ELSE LEFT( @cChars, 1 )
END

SELECT @instring = STUFF( @instring, @i, LEN( @cResult ), @cResult )

/* Add current to key if current <> last key character */
IF RIGHT( @cKey, 1 ) != LEFT( @cResult, 1 )
SELECT @cKey = @cKey + @cResult


SELECT @i = @i + 1

END


/* (21) transcode terminal 'AY' to 'Y' */
IF RIGHT( @cKey, 2 ) = 'AY'
SELECT @cKey = LEFT( @cKey, LEN( @cKey ) - 2 ) + 'Y'

/* (22) remove traling vowels */
/* start vowels */
SELECT @i = 1
WHILE CHARINDEX( SUBSTRING( @cKey, @i, 1 ), @cVowels ) > 0
/* replace vowels with spaces */
SELECT @cKey = STUFF( @cKey, @i, 1, ' ' ),
@i = @i + 1

/* end vowels */
SELECT @i = LEN( @cKey )
WHILE CHARINDEX( SUBSTRING( @cKey, @i, 1 ), @cVowels ) > 0
/* replace vowels with spaces */
SELECT @cKey = STUFF( @cKey, @i, 1, ' ' ),
@i = @i - 1
/* remove spaces */
SELECT @cKey = REPLACE( @cKey, ' ', '' )

/* (23) collapse all strings of repeated characters */
/* not neede, see 'Add current to key if current <> last key character' before step (21) */

/* (24) if first char of original surname was a vowel, append it to the code */
IF CHARINDEX( @cFirst_char, @cVowels ) > 0
SELECT @cKey = @cKey + @cFirst_char

RETURN @cKey

END

-------------------------------------------------------

You can compare result's of this function with result's
on ( textboxes on the top of page )
and if find any diferencess, just post them there and I fix the problem.

I assume that the problems may be there:
/* (15) transcode 'SH' to 'S' */
WHEN LEFT( @cChars, 2 ) = 'SH' THEN 'S'

/* (17) transcode 'YW' to 'Y' */
WHEN LEFT( @cChars, 2 ) = 'YW' THEN 'Y'
/* (18) if not first or last character, change 'Y' to 'A' */
WHEN @i > 1 AND @i < LEN( @instring ) AND LEFT( @cChars, 1 ) = 'Y' THEN 'A'
/* (19) transcode 'WR' to 'R' */
WHEN LEFT( @cChars, 2 ) = 'WR' THEN 'R'


Have a best matches :)


Zhavic

---------------------------------------------------------------
In the 1960s you needed the power of two Comodore64s to get a rocket to the moon. Now you need a machine which is a vast number of times more powerful just to run the most popular GUI.
 
Zhavic,

I can't even express my gratitude to you! Your incredible efforts are ...well, incredible! My co-workers at my company are equally impressed and appreciative of your efforts - many said they are going to vote for you for Tip-Master of the week.

Very, very impressive and kind of you!

Many thanks!
Lazer of Chicago
 
Zhavic,

I forgot to say - it works great! 100%!!!

:)
Lazer
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top