Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

finding duplicates

Status
Not open for further replies.

scroce

MIS
Joined
Nov 30, 2000
Messages
780
Location
US
I wrote the following semi-successful code which returns duplicate records. One problem - it is SO S---L---O---W - I mean, really SLOW, it takes like a few hours to complete the check. There's got to be a better way. Can anyone offer any advice?

It uses nested loops to check each row against every other row in a very very large spreadsheet, no less than 15000 rows.

Code:
Dim DupCount As Integer
Dim RowCount, LastRow As Integer
Dim NextWholeRow As String, CurrentWholeRow As String
Dim CurrentCell As Range, NextCell As Range

Set CurrentCell = ActiveSheet.Range("c18")

RowCount = 1
Do Until CurrentCell.Row = 20000
  Do Until RowCount = 20000
    Set NextCell = CurrentCell.Offset(RowCount, 0)
    CurrentWholeRow = CurrentCell.Value & CurrentCell.Offset(0, 1) & CurrentCell.Offset(0, 3) & FixOverflow(CurrentCell.Offset(0, 7)) & CurrentCell.Offset(0, 11)
    NextWholeRow = NextCell.Value & NextCell.Offset(0, 1) & NextCell.Offset(0, 3) & FixOverflow(NextCell.Offset(0, 7)) & NextCell.Offset(0, 11)
    
  If CurrentWholeRow <> &quot;x&quot; Then
    If NextWholeRow = CurrentWholeRow Then
      MsgBox &quot;I found a duplicate&quot;
      DupCount = DupCount + 1
    End If
  End If
   RowCount = RowCount + 1
  Loop

Set NextCell = CurrentCell.Offset(1, 0)
Set CurrentCell = NextCell
RowCount = 1
' CurrentCell = CurrentCell.Offset(1, 0)
Loop

MsgBox DupCount & &quot; Duplicates found&quot;

Ah say, there's somethin' a little &quot;eeeeeeee&quot; 'bout a boy who don't like basbawl...
 
Don't know if you want to try it, but here is a fast way I check for dups:

Say I have data in Col A,B, and C

In D1, I would put:

=A1&B1&C1

in E1, I put:

=IF(ISNA(VLOOKUP(D1,$D2:$D$20000,1,0)),&quot;&quot;,&quot;Duplicate&quot;)

copy both columns down and you will have an indication in column E that will identify duplicates.

Just another way of doing things, and I bet it will be alot faster.



Blue [dragon]

If I wasn't Blue, I would just be a Dragon...
 
This ran in about 2 seconds.
Code:
Sub atest()
    Dim DupCount As Integer
    Dim RowCount, LastRow As Integer
    Dim PrevWholeRow As String, CurrentWholeRow As String
    Dim CurrentCell As Range
    
    PrevWholeRow = &quot;&quot;
    For Each CurrentCell In ActiveSheet.Range(Range(&quot;c18&quot;), Range(&quot;c20018&quot;))
        With CurrentCell
            CurrentWholeRow = .Value & .Offset(0, 1) & .Offset(0, 3) & FixOverflow(.Offset(0, 7)) & .Offset(0, 11)
        End With
        If CurrentWholeRow <> &quot;x&quot; Then
            If NextWholeRow = CurrentWholeRow Then
'                MsgBox &quot;I found a duplicate&quot;
                DupCount = DupCount + 1
            End If
        End If
        PrevWholeRow = CurrentWholeRow
    Next
    MsgBox DupCount & &quot; Duplicates found&quot;
End Sub
Function FixOverflow(s As String)
    FixOverflow = s
End Function


Skip,
 
Sorry -- missed the ver last compare...
Code:
Sub atest()
    Dim DupCount As Integer
    Dim RowCount, LastRow As Integer
    Dim PrevWholeRow As String, CurrentWholeRow As String
    Dim CurrentCell As Range
    
    PrevWholeRow = &quot;&quot;
    For Each CurrentCell In ActiveSheet.Range(Range(&quot;c18&quot;), Range(&quot;c20018&quot;))
        With CurrentCell
            CurrentWholeRow = .Value & .Offset(0, 1) & .Offset(0, 3) & FixOverflow(.Offset(0, 7)) & .Offset(0, 11)
        End With
        GoSub Compare
        PrevWholeRow = CurrentWholeRow
    Next
    GoSub Compare
    MsgBox DupCount & &quot; Duplicates found&quot;
    Exit Sub
Compare:
    If CurrentWholeRow <> &quot;x&quot; Then
        If NextWholeRow = CurrentWholeRow Then
'                MsgBox &quot;I found a duplicate&quot;
            DupCount = DupCount + 1
        End If
    End If
    Return
End Sub
:-)

Skip,
 
Thanks everyone - this gives me some new ideas to work with.
I'll post back with results


Ah say, there's somethin' a little &quot;eeeeeeee&quot; 'bout a boy who don't like basbawl...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top