Smart questions
Smart answers
Smart people
Join Tek-Tips Forums
INTELLIGENT WORK FORUMS
FOR COMPUTER PROFESSIONALS

Member Login




Remember Me
Forgot Password?
Join Us!

Come Join Us!

Are you a
Computer / IT professional?
Join Tek-Tips now!
  • Talk With Other Members
  • Be Notified Of Responses
    To Your Posts
  • Keyword Search
  • One-Click Access To Your
    Favorite Forums
  • Automated Signatures
    On Your Posts
  • Best Of All, It's Free!

Join Tek-Tips
*Tek-Tips's functionality depends on members receiving e-mail. By joining you are opting in to receive e-mail.

Posting Guidelines

Promoting, selling, recruiting, coursework and thesis posting is forbidden.
Jobs from Indeed

Link To This Forum!

Partner Button
Add Stickiness To Your Site By Linking To This Professionally Managed Technical Forum.
Just copy and paste the
code below into your site.

Microsoft SQL Server: Programming FAQ

SoundEx function

A better Soundex code
Posted: 3 Sep 03

Terms that are often misspelled can be a problem for database designers. Names, for example, are variable length, can have strange spellings, and they are not unique. American names have a diversity of ethnic origins, which give us names pronounced the same way but spelled differently and vice versa.

Words can be misspelled or have multiple spellings, especially across different cultures or national sources.

To solve this problem, we need phonetic algorithms which can find similar sounding terms and names. Just such a family of algorithms exist and are called SoundExes, after the first patented version.

A Soundex search algorithm takes a word, such as a person's name, as input and produces a character string which identifies a set of words that are (roughly) phonetically alike. It is very handy for searching large databases when the user has incomplete data.

Example of words, that has the same soundex code:

rodger = roger
phone = fone
fillup = phillup
car = kar        
cohn = cohen
shira = sheeruh
steven = stephen

There are more algorythms, some of thoese can be found there:
http://www.avotaynu.com/soundex.html
http://www.statcan.ca/english/research/85-602-XIE/85-602-XIE.pdf

This function is based on 'New York State Identification and Intelligence System (NYSIIS) Phonetic Encoder' algorythm from
http://www.dropby.com/NYSIIS.html

Algorythm:
( 1) remove all 'S' and 'Z' chars from the end of the surname
( 2) transcode initial strings
     MAC => MC
     PF  => F
( 3) Transcode trailing strings as follows,
     IX => IC
     EX => EC
     YE,EE,IE => Y
     NT,ND => D
( 4) transcode 'EV' to 'EF' if not at start of name
( 5) use first character of name as first character of key
( 6) remove any 'W' that follows a vowel
( 7) replace all vowels with 'A'
( 8) transcode 'GHT' to 'GT'
( 9) transcode 'DG' to 'G'
(10) transcode 'PH' to 'F'
(11) if not first character, eliminate all 'H' preceded or followed by a vowel
(12) change 'KN' to 'N', else 'K' to 'C'
(13) if not first character, change 'M' to 'N'
(14) if not first character, change 'Q' to 'G'
(15) transcode 'SH' to 'S'
(16) transcode 'SCH' to 'S'
(17) transcode 'YW' to 'Y'
(18) if not first or last character, change 'Y' to 'A'
(19) transcode 'WR' to 'R'
(20) if not first character, change 'Z' to 'S'
(21) transcode terminal 'AY' to 'Y'
(22) remove traling vowels
(23) collapse all strings of repeated characters
(24) if first char of original surname was a vowel, append it to the code

Code:

CREATE FUNCTION dbo.SoundexAplhaFunction( @instring VARCHAR(50) )
RETURNS VARCHAR(50)
AS
BEGIN

    DECLARE @cKey VARCHAR(50),
            @cChar VARCHAR(3),
            @cChars VARCHAR(3),
            @cVowels VARCHAR(10),
            @cFirst_char CHAR(1),
            @cResult VARCHAR(10)

    DECLARE @i integer

    /* vowels */
    SELECT @cVowels = 'AEIOU'

    /* trim all spaces */
    SELECT @instring = REPLACE( @instring, ' ', '' )
    SELECT @instring = UPPER( @instring )

    /* save first char */
    SELECT @cFirst_char = LEFT( @instring, 1 )

    /* ( 1) remove all 'S' and 'Z' chars from the end of the surname */
    SELECT @i = LEN( @instring )
    WHILE SUBSTRING( @instring, @i, 1 ) IN ( 'S', 'Z' )
        SELECT @i = @i - 1
    SELECT @instring = LEFT( @instring, @i )

     /* ( 2) transcode initial strings */
    /*      MAC => MC                 */
    /*      PF => F                     */
    SELECT @instring = CASE
                            WHEN LEFT( @instring, 3 ) = 'MAC' THEN 'MC' + SUBSTRING( @instring, 3, LEN( @instring ) )
                            WHEN LEFT( @instring, 2 ) = 'PF' THEN 'F' + SUBSTRING( @instring, 3, LEN( @instring ) )
                            ELSE @instring    -- do nothing
                       END

    /* ( 3) Transcode trailing strings as follows */
    /*      IX       => IC                        */
    /*      EX       => EC                        */
    /*      YE,EE,IE => Y                         */
    /*      NT,ND    => D                         */
    SELECT @instring = CASE
                            WHEN RIGHT( @instring, 2 ) = 'IX' THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'IC'
                            WHEN RIGHT( @instring, 2 ) = 'EX' THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'EC'
                            WHEN RIGHT( @instring, 2 ) IN ( 'YE', 'EE', 'IE' ) THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'Y'
                            WHEN RIGHT( @instring, 2 ) IN ( 'NT', 'ND' ) THEN LEFT( @instring, LEN( @instring ) - 2 ) + 'D'
                            ELSE @instring    -- do nothing
                       END

    /* the step ( 4) I moved to begining of WHILE ... END below */

    /* ( 5) use first character of name as first character of key */
    /* SELECT @cKey = LEFT( @instring, 1 ) */
    /* don't now, what they thing with this, but with @cKey = '' it seems to be working */
    SELECT @cKey = ''


    SELECT @i = 1
    /* while not end of @instring */
    WHILE SUBSTRING( @instring, @i, 1 ) > ''
    BEGIN
        SELECT @cChars = SUBSTRING( @instring, @i, 3 )

        SELECT @cResult = CASE /* ( 4) transcode 'EV' to 'EF' if not at start of name */
                               WHEN @i > 1 AND LEFT( @cChars, 2 ) = 'EV' THEN 'AF'
                               /* ( 6) remove any 'W' that follows a vowel */
                               WHEN LEFT( @cChars, 1 ) = 'W' AND CHARINDEX( SUBSTRING( @instring, @i - 1, 1 ), @cVowels ) > 0 THEN SUBSTRING( @instring, @i - 1, 1 )
                               /* ( 7) replace all vowels with 'A' */
                               WHEN CHARINDEX( LEFT( @cChars, 1 ), @cVowels ) > 0 THEN 'A'
                                /* ( 8) transcode 'GHT' to 'GT' */
                               WHEN LEFT( @cChars, 2 ) = 'GHT' THEN 'GGG'
                                /* ( 9) transcode 'DG' to 'G' */
                               WHEN LEFT( @cChars, 2 ) = 'DG' THEN 'G'
                               /* (10) transcode 'PH' to 'F' */
                               WHEN LEFT( @cChars, 2 ) = 'PH' THEN 'F'
                                /* (11) if not first character, eliminate all 'H' preceded or followed by a vowel */
                               WHEN LEFT( @cChars, 1 ) = 'H' AND @i > 1 AND ( CHARINDEX( SUBSTRING( @instring, @i - 1, 1 ), @cVowels ) > 0 OR CHARINDEX( SUBSTRING( @instring, @i + 1, 1 ), @cVowels ) > 0 ) THEN SUBSTRING( @instring, @i - 1, 1 )
                               /* (12) change 'KN' to 'N', else 'K' to 'C' */
                               WHEN LEFT( @cChars, 2 ) = 'KN' THEN 'N'
                               WHEN LEFT( @cChars, 1 ) = 'K' THEN 'C'
                               /* (13) if not first character, change 'M' to 'N' */
                               WHEN @i > 1 AND LEFT( @cChars, 1 ) = 'M' THEN 'N'
                               /* (14) if not first character, change 'Q' to 'G' */
                               WHEN @i > 1 AND LEFT( @cChars, 1 ) = 'Q' THEN 'G'
                               /* (15) transcode 'SH' to 'S' */
                               WHEN LEFT( @cChars, 2 ) = 'SH' THEN 'S'
                               /* (16) transcode 'SCH' to 'S' */
                               WHEN @cChars = 'SCH' THEN 'SSS'
                               /* (17) transcode 'YW' to 'Y' */
                               WHEN LEFT( @cChars, 2 ) = 'YW' THEN 'Y'
                               /* (18) if not first or last character, change 'Y' to 'A' */
                               WHEN @i > 1 AND @i < LEN( @instring ) AND LEFT( @cChars, 1 ) = 'Y' THEN 'A'     
                               /* (19) transcode 'WR' to 'R' */
                               WHEN LEFT( @cChars, 2 ) = 'WR' THEN 'R'
                               /* (20) if not first character, change 'Z' to 'S' */
                               WHEN @i > 1 AND LEFT( @cChars, 1 ) = 'Z' THEN 'S'
                               ELSE LEFT( @cChars, 1 )
                         END

        SELECT @instring = STUFF( @instring, @i, LEN( @cResult ), @cResult )

        /* Add current to key if current <> last key character */
        IF RIGHT( @cKey, 1 ) != LEFT( @cResult, 1 )
            SELECT @cKey = @cKey + @cResult


        SELECT @i = @i + 1

    END


    /* (21) transcode terminal 'AY' to 'Y' */
    IF RIGHT( @cKey, 2 ) = 'AY'
        SELECT @cKey = LEFT( @cKey, LEN( @cKey ) - 2 ) + 'Y'
  
    /* (22) remove traling vowels */
    /*      start vowels */
    SELECT @i = 1
    WHILE CHARINDEX( SUBSTRING( @cKey, @i, 1 ), @cVowels ) > 0
        /* replace vowels with spaces */
        SELECT @cKey = STUFF( @cKey, @i, 1, ' ' ),
               @i = @i + 1

    /*     end vowels */
    SELECT @i = LEN( @cKey )
    WHILE CHARINDEX( SUBSTRING( @cKey, @i, 1 ), @cVowels ) > 0
        /* replace vowels with spaces */
        SELECT @cKey = STUFF( @cKey, @i, 1, ' ' ),
               @i = @i - 1
    /*     remove spaces */
    SELECT @cKey = REPLACE( @cKey, ' ', '' )

    /* (23) collapse all strings of repeated characters */
    /* not neede, see 'Add current to key if current <> last key character' before step (21) */

/* (24) if first char of original surname was a vowel, append it to the start of code */
    IF CHARINDEX( @cFirst_char, @cVowels ) > 0
        SELECT @cKey = @cFirst_char + @cKey

    RETURN @cKey

END


Using:

SELECT dbo.SoundexAplhaFunction( 'phone' ),
       dbo.SoundexAplhaFunction( 'fone' )

Back to Microsoft SQL Server: Programming FAQ Index
Back to Microsoft SQL Server: Programming Forum

My Archive

Close Box

Join Tek-Tips® Today!

Join your peers on the Internet's largest technical computer professional community.
It's easy to join and it's free.

Here's Why Members Love Tek-Tips Forums:

Register now while it's still free!

Already a member? Close this window and log in.

Join Us             Close