Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations bkrike on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Need Assistance Optimizing Code

Status
Not open for further replies.

MrMeric

Technical User
Apr 1, 2004
48
US
Good day!

Background:
1. Using VB6 (VBA) within Microsoft Access
2. Utilizing 3 functions within the main SimilSearch() routine. These are:
a. Simil()
i. Simil is a "smart string comparison" algorithm. It takes two strings, compares them, then spits out a percentage of likeness.
b. SoundsLike()
ii. SoundsLike is a cousin of the Soundex algorithm. SoundsLike is a tad more strict on it's comparisons, thus making it more efficient.
c. AddSimilarEntry()
iii. Simple routine that adds the similar entry to a database table.

============================

Question:
SimilSearch() searches through customer names within my database for like names. If it finds any, it add a new entry into a seperate table.
The main routine SimilSearch() is nothing more than a progressive linear algorithm. It compares each item against the current key. This take a while.
How can I cut down on execution time? Is there a way I can make this "smart" and exclude comparing items that are obviously not even close?

============================

Code:
SimilSearch() *Main routine
Code:
Public Function SimilSearch()
Dim db As DAO.Database
Dim rsSearch, rsSearchClone, rsSimilarTable As DAO.Recordset
Dim searchString, comparisonString, searchSound, comparisonSound As String
Dim SimilThreshold, stringRelevance As Double
Dim NumberFound, i, j As Integer
Dim timeStart As Double
Dim timeStop As Double

    On Error GoTo errorhandler
    
    Set db = CurrentDb
    'Recodset we wish to search
    Set rsSearch = db.OpenRecordset("SELECT Customer.Name AS SearchField FROM Customer;", dbOpenSnapshot)
    Set rsSearchClone = rsSearch.Clone
    'Recordset we wish to place similar records into
    Set rsSimilarTable = db.OpenRecordset("SELECT * FROM SimilarStrings;", dbOpenDynaset)
    'Threshold for string comparison
    'It seems that .83 seems to be the magic number. Any higher or lower will
    'result in either too many or too few results, respectivley.
    SimilThreshold = 0.83
    
    With rsSearch
        .MoveLast
        .MoveFirst
        'Set the progress bar control max
        Me.progBar.max = .RecordCount - 1
        Do Until .EOF
            Me.progBar.value = Me.progBar.value + 1
            'Set the string we with to compare all others to for this iteration
            searchString = ![SearchField]
            'This is the loop where we compare 'searchString' to all other
            'entries within the database.
            'NOTE - This is progressive algorithm. If 'searchString' is
            'record number 15, then we start the comparison at
            'record 16.
            i = i + 1
            With rsSearchClone
                .MoveFirst
                .Move i
                Do Until .EOF
                    'stringRelevance holds the comparison percent relevancy
                    comparisonString = ![SearchField]
                    stringRelevance = Simil((searchString), (comparisonString))
                    If stringRelevance >= SimilThreshold Then
                        'For added strictness, we compare the two strings using
                        'the SoundsLike algorithm which is related to Soundex.
                        'Although the threshold was met, this dosen't mean that the
                        'two strings are completely relevant to one another.
                        searchSound = SoundsLike(searchString)
                        comparisonSound = SoundsLike(comparisonString)
                        If searchSound Like comparisonSound Then
                            If AddSimilarEntry(rsSimilarTable, searchString, comparisonString _
                                                , stringRelevance, searchSound) = False Then
                                MsgBox "Error adding new similar entry to database."
                            End If
                        End If
                    End If
                    'Move to the next comparison item
                    .MoveNext
                Loop
            End With
            'Move to the next key item
            .MoveNext
        Loop
    End With
    
errorhandler:
    Select Case Err.Number
        Case 0 To 3021
            Resume Next
        Case Else
            MsgBox "Error: " & Err.Number & vbCrLf & "Description: " & Err.Description
    End Select

End Function

AddSimilarEntry()
Code:
Public Function AddSimilarEntry(rsTemp As DAO.Recordset, ByVal searchString As String, ByVal comparisonString As String _
                                , ByVal stringRelevance As String, ByVal searchSound As String) As Boolean
    On Error GoTo errorhandler
    
    ' Adds a new record to a Recordset using the data passed
    ' by the calling procedure. The new record is then made
    ' the current record.
    With rsTemp
        .AddNew
        ![OriginalString] = searchString
        ![compareString] = comparisonString
        ![PercentRelevance] = stringRelevance
        ![SoundsLike] = searchSound
        .Update
        .Bookmark = .LastModified
    End With
    
    AddSimilarEntry = True
    
errorhandler:
    If Err.Number > 0 Then
        AddSimilarEntry = False
    End If
End Function


Simil()
Code:
'**************************************
' Name: _A Smart String Comparison
' Description:This takes 2 strings and r
'     eturns the percent alike that they are.
'     (i.e. "test string number 1" is 86.48% s
'     imilar to "teststring numb 2")
'   This Function is very useful! You can use it In databases
'   To match data that may have errors in it. Examples being people's names,
'   company names, addresses, or anything Else where you may encounter misspellings or
'   inconsistencies in the data. Your feedback and/or votes are greatly appreciated! --
'   NEW - updated to use Byte arrays instead of strings, 50-300% performance improvement!
'   An implementation of the , Ratcliff/Obershelp/Levenshtein method.
'
' Inputs:mainstring and checkstring, the
'     2 strings to compare
'
' Returns:how similar the 2 strings are
'     (percent, as in .8)
'
' Assumes:This code recursively loops th
'     rough the 2 strings, finding the largest
'     common substring, then checking the rema
'     inder of the string.

Private b1() As Byte
Private b2() As Byte
Public Function Simil(String1 As String, String2 As String) As Double
    Dim l1 As Long
    Dim l2 As Long
    Dim l As Long
    Dim r As Double
    If UCase(String1) = UCase(String2) Then
        r = 1
    Else
        l1 = Len(String1)
        l2 = Len(String2)
        If l1 = 0 Or l2 = 0 Then
            r = 0
        Else
            ReDim b1(1 To l1): ReDim b2(1 To l2)
            For l = 1 To l1
                b1(l) = Asc(UCase(Mid(String1, l, 1)))
            Next
            For l = 1 To l2
                b2(l) = Asc(UCase(Mid(String2, l, 1)))
            Next
            r = SubSim(1, l1, 1, l2) / (l1 + l2) * 2
        End If
    End If
    Simil = r
    Erase b1
    Erase b2
End Function

Private Function SubSim(st1 As Long, end1 As Long, st2 As Long, end2 As Long) As Long
    Dim c1 As Long
    Dim c2 As Long
    Dim ns1 As Long
    Dim ns2 As Long
    Dim i As Long
    Dim max As Long
    If st1 > end1 Or st2 > end2 Or st1 <= 0 Or st2 <= 0 Then Exit Function
    For c1 = st1 To end1
        For c2 = st2 To end2
            i = 0
            Do Until b1(c1 + i) <> b2(c2 + i)
                i = i + 1
                If i > max Then
                    ns1 = c1
                    ns2 = c2
                    max = i
                End If
                If c1 + i > end1 Or c2 + i > end2 Then Exit Do
            Loop
        Next
    Next
    max = max + SubSim(ns1 + max, end1, ns2 + max, end2)
    max = max + SubSim(st1, ns1 - 1, st2, ns2 - 1)
SubSim = max
End Function

SoundsLike()
Code:
Public Function SoundsLike(ByVal pWord As String, Optional pAccuracy As Byte) As String

   ' char importance "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
   Const SoundCodes = "01230120022455012623010202"
   Dim lPunctuation As String
   Dim lCurChar As String
   Dim lWordLen As Integer
   Dim lCurValue As Integer
   Dim lSoundex As String
   Dim x As Integer

   'characters not taken into account for Soundex
   lPunctuation = ".,/`';][-=<>?:~}{+_)(*&^%$#@!\|1234567890" & Chr(34) & Chr(32)
   'default accuracy is 4
   If pAccuracy = 0 Then pAccuracy = 4
   If pWord <> "" Then
      'remove the punctuation and numeric characters from the string
      pWord = RemoveChars(pWord, lPunctuation)
      'convert to uppercase
      pWord = UCase(pWord)
      lWordLen = Len(pWord)
      'in accordance with soundex rules - start at 2nd
      'character
      lSoundex = Left(pWord, 1)
      x = 2
      'perform conversion until desired level of accuracy or
      'length of word reached
      While Len(lSoundex) < pAccuracy And x <= lWordLen
         lCurChar = Mid(pWord, x, 1)
         'ignore double characters
         If lCurChar <> Mid(pWord, x + 1, 1) Then
            'get the Soundex value
            lCurValue = Val(Mid(SoundCodes, (Asc(lCurChar) - 64), 1))
            'ignore zero Soundex values
            If lCurValue > 0 Then lSoundex = lSoundex & lCurValue
         End If
         x = x + 1
      Wend
      'if the length is less than the desired accuracy, pad
      'with zeroes
      lSoundex = lSoundex & String(pAccuracy - Len(lSoundex), "0")
      SoundsLike = lSoundex
   End If
End Function

Public Function RemoveChars(pMessage As String, pRemovable As String) As String
   Dim lMessage As String
   Dim lCurChar As String
   Dim x As Integer

   'cycle through message string
   For x = 1 To Len(pMessage)
      'get the current character
      lCurChar = Mid(pMessage, x, 1)
      'if the current character is not in the removable list,
      'add it to the local message
      If InStr(pRemovable, lCurChar) = 0 Then lMessage = lMessage & lCurChar
   Next x
   RemoveChars = lMessage
End Function
============================
 
Well, it seems that the Simil() function is the culprit.
It runs very slow...

I found it off the 'net. It gives me the results I want, but it's not efficient. As you can see it makes two recursive calls!

I've given this function some better variable names. I'd appreciate it if someone could help me optimize this code:

Code:
Private LetterByteArrayOne() As Byte
Private LetterByteArrayTwo() As Byte
----------------------------
Public Function Simil(StringOne As String, StringTwo As String) As Double
    Dim StringOneLength As Long
    Dim StringTwoLength As Long
    Dim l As Long
    Dim r As Double
    If UCase(StringOne) = UCase(StringTwo) Then
        r = 1
    Else
        StringOneLength = Len(StringOne)
        StringTwoLength = Len(StringTwo)
        If (StringOneLength = 0) Or (StringTwoLength = 0) Then
            r = 0
        Else
            'Find the ASCII equivalent to the letter
            'and place it into the byte array
            ReDim LetterByteArrayOne(1 To StringOneLength)
            ReDim LetterByteArrayTwo(1 To StringTwoLength)
            For l = 1 To StringOneLength
                LetterByteArrayOne(l) = Asc(UCase(Mid(StringOne, l, 1)))
            Next
            For l = 1 To StringTwoLength
                LetterByteArrayTwo(l) = Asc(UCase(Mid(StringTwo, l, 1)))
            Next
            r = SubSim(1, StringOneLength, 1, StringTwoLength) / (StringOneLength + StringTwoLength) * 2
        End If
    End If
    Simil = r
    Erase LetterByteArrayOne
    Erase LetterByteArrayTwo
End Function

----------------------------

Private Function SubSim(StringOneStart As Long, StringOneEnd As Long, StringTwoStart As Long, StringTwoEnd As Long) As Long
Dim CurrentStringOneItem As Long
Dim CurrentStringTwoItem As Long
Dim ns1 As Long
Dim ns2 As Long
Dim i As Long
Dim max As Long

If StringOneStart > StringOneEnd Or StringTwoStart > StringTwoEnd Or StringOneStart <= 0 Or StringTwoStart <= 0 Then Exit Function

For CurrentStringOneItem = StringOneStart To StringOneEnd
    For CurrentStringTwoItem = StringTwoStart To StringTwoEnd
        i = 0
        Do Until LetterByteArrayOne(CurrentStringOneItem + i) <> LetterByteArrayTwo(CurrentStringTwoItem + i)
            i = i + 1
            If i > max Then
                ns1 = CurrentStringOneItem
                ns2 = CurrentStringTwoItem
                max = i
            End If
            If CurrentStringOneItem + i > StringOneEnd Or CurrentStringTwoItem + i > StringTwoEnd Then Exit Do
        Loop
    Next
Next
max = max + SubSim(ns1 + max, StringOneEnd, ns2 + max, StringTwoEnd)
max = max + SubSim(StringOneStart, ns1 - 1, StringTwoStart, ns2 - 1)
SubSim = max
End Function
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top