Finding integer sequence amongst another integer sequence. 1

VickyC · Jun 12, 2012

hello - I'm hoping there is a good solution to a problem I'm having. I can probably figure out the details if someone can point me in the right direction.

Consider two tables.

Code:

tblA                                 tblB
A_ID   A1  A2  A3  A4                 B_ID   B1  B2  B3  B4  B5  B6  B7  B8  B9  B10
 1     45  42  42  36                  100   75  45  45  43  42  41  36  23  11   9
 2     54  29  21   8                  101   65  50  45  44  44  42  42  39  36  33    
 3     35  35  13  10                  102   56  52  50  51  37  33  29  10   7   3
 4  etc...                             104   45  42  39  35  35  35  35  30  13  10
                                       105  etc...

In tblA, A1 >= A2 >= A3 >= A4.
In tblB, B1 >= B2 >= B3 >= B4 >= B5 >= B6 >= B7 >= B8 >= B9 >= B10

Here's my question. I need to DELETE all records in tblB that contain the values (A1, A2, A3, A4) as a subset.

In the example shown, record 101 gets deleted because it contains the from record 1:

Code:

 1     [b]45  42  42  36[/b]                  101   65  50  [b]45[/b]  44  44  [b]42  42[/b]  39  [b]36[/b]  33

Similarly, 104 gets deleted because it contains the values in record 3:

Code:

 3     [b]35  35  13  10[/b]                         104   45  42  39  [b]35  35[/b]  35  35  30  [b]13  10[/b]

Note - record 100 DOES NOT get deleted. It has all the values in row 1, but it only has one 42.

tblA has 4 columns of data. If tblB also had the same number of columns, then I can just use a Left Join ON (A1 = B1) AND (A2 = B2) AND... ..WHERE tblB.B_ID IS NULL.
When tblB has 5 data columns, I can do something similar, but the 'ON' is more complicated...

Code:

ON (((A1=B1) AND A2=B2) AND  A3=B3) AND  A4=B4))  OR  ((A1=B1) AND A2=B2) AND  A3=B3) AND  A4=[b]B5[/b]))  OR  ... etc.

My real problem is that tblB has so many data columns that the SQL has become completely lenghty and awkward. I'm hoping there is a better way.

Many thanks, Vicky

MajP · Jun 14, 2012

Your title is sequence, but the description says "contains the values". This may make it more difficult. In other words assume
in A
5 3 2 1
if B is the following
6 5 4 3 2 1
Is that a match? It does not have the same sequence, but contains all the values. B interrupts the sequence with the number 4. If it has to be a matching sequence I could use the instr function to make it easier. Then what I would build a UDF that returns a concatenated string. I would separate it with a delimiter. It would return
5:3:2:1
Same on b
9:7:6:5:3:2:1
Then I would build a second udf that I pass the concatenated strings and it would use an instr function to check if concatenated A is in concatenated B and return true.

If it only has to contain the values (and the same amount of each value) then that is a little more complicated. I would do it again with a UDF. Pass in all the A and B values and return true.

Something like
1) Take first number from A
2) Find first match in B if no match return false
3) loop remain letters in A an move through B from current location until finding match. If no match return false.
4) else return true

In that version you could use the function in a query.
Personally I would do this all in code. Basically the same logic, but instead of passing in field values I would pass in the query or table names. Then loop the records and loop the fields in A and B (excluding the first field). In that way if table A has 50 fields and B 200 fields this would be still very doable. I will give a try and pass the code if it works. Although that may sound complicated my guess the amount of code is very short since it is really just 3 loops.

VickyC · Jun 14, 2012

hi MajP - Yes it is true that the matches for A1, A2, A3, A4 can be spread out and do not need to be contiguous, still while still being in descending order. I too have realized that code might be better here that just straight SQL. I think this may be a bit slower, but I only need to do this process about 25 times - speed shouldn't be too big an issue.

Although you mentioned 3 loops, I'm starting to get this to work using 4. Basically, I loop through every record of tblA. For each of these, I loop through every record of tblB. For each of these, I loop through each data COLUMN of tblA. For each of these, I loop through every data column of tblB. Things seem to be working very well, and faster than I had thought.

This highlights a problem I seem to struggle with from time to time - I don't always have a good sense of when it is best to use straight VBA code, SQL, or some combo of the two.

Thanks for the pointers
Vicky C.

MajP · Jun 14, 2012

If you got this working then good on you. Because I think I was a little cavalier in how easy this would be. Here is my solution and not even sure it is correct. This was pretty hard. I am curious how you did it. The hard part was once you find a value in B the next value from A has to occur in a field in B after the field in which you found the value. That is the only way to ensure that if A has two 45s then B has at least two 45s. I loop all records in A and all records in B. I loaded each record into a collection and pass that to a function to check to see if the values in A are contained in B.

Code:

 Public Sub testAinB()
   FindAinB "TableA", "TableB"
 End Sub

Code:

Public Sub FindAinB(qryA As String, qryB As String)
  Dim rsA As DAO.Recordset
  Dim rsB As DAO.Recordset
  Dim fldA As DAO.Field
  Dim fldB As DAO.Field
  Dim Aid As Integer
  Dim Bid As Integer
  Dim Avalue As Integer
  Dim Bvalue As Integer
  Dim collA As Collection
  Dim collB As Collection
  
  Set rsA = CurrentDb.OpenRecordset(qryA, dbOpenDynaset)
  Set rsB = CurrentDb.OpenRecordset(qryB, dbOpenDynaset)
  
  Do While Not rsA.EOF
      Set collA = New Collection
      For Each fldA In rsA.Fields
        If fldA.Name = rsA.Fields(0).Name Then
          Aid = fldA
        Else
         collA.Add (fldA)
        End If
      Next fldA
    
      Do While Not rsB.EOF
         Set collB = New Collection
         For Each fldB In rsB.Fields
           If fldB.Name = rsB.Fields(0).Name Then
             Bid = rsB.Fields(0)
           Else
            collB.Add (fldB)
           End If
          Next fldB
           If AinB(collA, collB) Then
              'code here to do something once you find A in B such as delete
              Debug.Print "True" & " " & Aid & " " & Bid
           End If
       rsB.MoveNext
    Loop
    rsA.MoveNext
  Loop
End Sub


Public Function AinB(collA As Collection, collB As Collection) As Boolean
  Dim i As Integer
  Dim j As Integer
  Dim startLoc As Integer
  Dim numberFound As Integer
  Dim numberRequired As Integer
  numberRequired = collA.Count
  startLoc = 1
    For i = 1 To collA.Count
      blnFound = False
      For j = startLoc To collB.Count
        'Debug.Print "A " & collA(i) & "   B " & collB(j)
        If collA(i) = collB(j) Then
          numberFound = numberFound + 1
          startLoc = j + 1
          'Found all required
          If numberFound = numberRequired Then
            AinB = True
            Exit Function
          Else
            'Did not find all and at end
            If startLoc > collB.Count Then Exit Function
            'Check next A
            Exit For
          End If
        End If
      Next j
    Next i
End Function

MajP · Jun 15, 2012

I forgot to answer this question

I don't always have a good sense of when it is best to use straight VBA code, SQL, or some combo of the two.

SQL is designed to work well with normalized data, but does not really support non normal structures. Your data is nonnormal. Searching across rows I do in SQL searching across columns I write code.

But the better alternative is to normalize your data because 9 times out of ten SQL is more efficient. In some cases I write code when it is just easier for me then writing very detailed long SQL. With that said you probably should normalize the data and use SQL. Now that I look at it that solution should be far easier

1) Build normalizing union queries qryNormalA and qryNormalB. Your data would then look like

AID Fld Val
1 A1 45
1 A2 42
1 A3 41
1 A4 36
2 A1 54
...
3 A4 10

do the same with b

2) Now simlpy join the queries where Val A = Val B, sort by AID
3) If there was no duplicates this would be easy. Do a group by query and return the AID, BID where the amount of found records = 4
But assume you have
AID 1 = 4 4 2 1
BID 7 = 4 3 3 3 2 2 1

So you would return
AID BID
1 A1 4 7 B1 4
1 A2 4 7 B1 4

So you would have to have in your where statement to return the min AID where AID = BID and A val = B Val

You are also going to get 2 records for the value 2 (1 record in A matches two records in B) But you can have more so I do not think you need to worry about it. Just return the IDs where the count is >= 4

VickyC · Jun 15, 2012

hi MajP

Thanks for your code and for the suggestion re normalized tables. (I kind of knew that was going to come up sooner or later

).

For what it's worth, here's the VBA solution I'd worked on. In tblB I added a column RecordMatch (integer, default = 0). Instead of deleting the matches as I'd originally intended, I just not which A_ID in tblA made the match.

Code:

'Identify records in tblB whose integer sets include the full integer sets of a records in tblAA.
'   Repeats are allowed.  Both sets in DESC order.  tblA holds fewer integer values that tblB.
'
'eg:  tblA:   A_ID  A1  A2  A3  A4                            tblB:   B_ID  B1  B2  B3  B4  B5  B6  B7   RecordMatch
'             100  42  38  38  14     is fully contained INSIDE        234  44  42  41  38  38  38  14       100
'             101  42  38  38  14     is NOT INSIDE tblB record        555  42  41  38  14  14  12  11         0

Function IntSetA_Inside_IntSetB()
    
    Dim rsA As DAO.Recordset, rsB As DAO.Recordset
    Dim strTblA As String, strTblB As String
    Dim ColMatchCount As Integer, B_StartCol As Integer, ChkIf_A_ColMatches_B As Integer
    Dim A_FirstDataCol As String, A_LastDataCol As String, B_FirstDataCol As String, B_LastDataCol As String, B_NewStartCol As String
    
'@@@@@@@@@@@@@@@@@@  SET THESE BEFORE RUNNING FUNCTION IntSetA_Inside_IntSetB  @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    strTblA = "tblA_TEST": strTblB = "tblB_TEST"
    A_FirstDataCol = 1: A_LastDataCol = 4: B_FirstDataCol = 1: B_LastDataCol = 7    '(use 0 based index)
'@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    Set rsA = CurrentDb.OpenRecordset(strTblA, dbOpenDynaset)
    Set rsB = CurrentDb.OpenRecordset(strTblB, dbOpenDynaset)
    
    Do Until rsA.EOF
                                                            
        rsB.MoveFirst
        
        Do Until rsB.EOF
                                                            
            ColMatchCount = 0: B_NewStartCol = B_FirstDataCol
            
            For ACol = A_FirstDataCol To A_LastDataCol
                                                            
                'If an A column has no match, then bail and go to next tblA record
                ChkIf_A_ColMatches_B = 0
                
                For BCol = B_NewStartCol To B_LastDataCol
                
                    'I there is a match...
                    If rsA(ACol) = rsB(BCol) Then
                    
                        'when ColMatchFound = 4, we have a full match of A and B records
                        ColMatchCount = ColMatchCount + 1
                        
                        ChkIf_A_ColMatches_B = ChkIf_A_ColMatches_B + 1
                        
                        'if A1 finds its match in B3, say, then start looking A2's match at B4
                        B_NewStartCol = BCol + 1
                        
                        'found a column match, so go to next value of A
                        Exit For
                    End If
        
                Next BCol
                
                'if an A value has no B match, then move on to next tblB record
                If ChkIf_A_ColMatches_B = 0 Then Exit For
                
            Next ACol
            
            If ColMatchCount = A_LastDataCol - A_FirstDataCol + 1 Then      '=4, in this sample
                
                With rsB
                    .FindFirst "B_ID  = " & rsB!B_ID
                    If Not .NoMatch Then
                        .Edit
                        'Identify the match.  Or, get match count with...  !RecordMatch = !RecordMatch + 1
                        !RecordMatch = rsA!A_ID
                        .Update
                    End If
                End With
            End If
            
            rsB.MoveNext
            
        Loop
        
        rsA.MoveNext
    Loop
    
End Function

This actually works well, but I'd still like to explore doing this in SQL using normalized tables, as you mentioned.

Thanks for all your assistance
Vicky

PHV · Jun 15, 2012

Why use rsB.FindFirst to retrieve the actual row ?

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

VickyC · Jun 15, 2012

hi PHV - I guess the rsB.Findfirst is not needed, but I just had recalled a caution from Allen Browne some time ago. If I remember, his point was that it is best to check for such a match because, otherwise, the code can appear to be working until we hit a case where the Find fails. I'm on thin ice here - does this make any sense? Vicky

PHV · Jun 15, 2012

But you already are on the row you want to update !

VickyC · Jun 15, 2012

um.... ya! Good point

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Finding integer sequence amongst another integer sequence. 1

VickyC

Technical User

MajP

Technical User

VickyC

Technical User

MajP

Technical User

MajP

Technical User

VickyC

Technical User

PHV

MIS

VickyC

Technical User

PHV

MIS

VickyC

Technical User

Similar threads

Part and Inventory Search

Sponsor