Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

a new twist on finding duplicates ? 1

Status
Not open for further replies.

groleau

Programmer
Apr 12, 2006
110
US
There are many informative threads on finding duplicates.
So many that I don't know whether my particular issue was
in there--I gave up looking after reading about ten threads.

What I have is a huge collection of accounting entries. Account number and Facility together are supposed to be unique, but they are not.

However, I don't want to just eliminate the duplicates--I need to move them to another table.

I find it easy to do except for one detail: if the table
contains

A A B C D D D E F G

then I need the results to be

Dupes: A D D
Good: A B C D E F G

instead of

Dupes: A A D D D
Good: B C E F G

The latter is easy. I could use DISTINCT to shrink Dupes, and then merge it back into Good, but then I would lose the extras that I want to keep in Dupes.

I am very much a newbie to SQL--and it doesn't help that Jet SQL is *not* the language called SQL in my reference books. :)
 
there are many of doing this...here is one way...may not be the efficient way but give it a shot...

Code:
ok lets say your table has the following fields

recid|name|recdate
__________________
  1  | A  | 12/04/2006
  1  | A  | 12/03/2006
...
...

then

Select A.recid, A.name, A.recdate from mytable A
where A.recdate in ( Select top 1 recdate from
mytable B where A.recid=B.recid order by recdate desc)

this will give you the most recent record of any ID/Name and you will get the following output 

Good:  A B C D E F G

to get the dupes do the following...

Select A.recid, A.name, A.recdate from mytable A
where A.recdate not in ( select top 1 recdate from
mytable B where A.recid=B.recid order by recdate desc)
group by A.recid having count(A.recid)>1

this will give you the following:

Dupes: A D D

-DNG
 
There is this, too:
Delete Duplicates (Save One From Each Group)
faq701-5721
 
Both versions ran over fifteen minutes before I got impatient and killed them.

The progress bar does move a hair every once in a while,
so I guess something is happening. Perhaps I should have added indexes.

Or maybe I screwed up the SQL trying to convert to my field names. Here is the query it was on when I stopped it the second time around:

Code:
    DoCmd.RunSQL _
       ("SELECT * INTO No_Dupes FROM Simplified A          " & _
        "WHERE  A.Who_Cares                                " & _
        "   IN  (SELECT Top 1 Who_Cares                    " & _
        "        FROM Simplified B                         " & _
        "        WHERE A.Must_Be_Unique = B.Must_Be_Unique " & _
        "        ORDER BY Who_Cares);                      ")

--
Wes Groleau
 
Perhaps something like this ?
Code:
DoCmd.RunSQL( _
 "SELECT A.* INTO No_Dupes                         " & _
 "FROM Simplified A INNER JOIN (                   " & _
 "SELECT Must_Be_Unique, Max(Who_Cares) AS Keep_It " & _
 "FROM Simplified GROUP BY Must_Be_Unique          " & _
 ") B ON A.Must_Be_Unique = B.Must_Be_Unique       " & _
 " AND A.Who_Cares = B.Keep_It;")

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Thanks, _very_ much to everyone!

The version by 'DotNetGnat' worked in the simplified version after I removed the last "ORDER BY...HAVING". Jet didn't like them, but I couldn't quite understand the error messages. I believe the other versions were good, too, but I didn't let them finish because without an index, they were too slow.

Unfortunately, adding an index with
Code:
DoCmd.RunSQL ("CREATE INDEX ....;")
doesn't work. So I had to put a Stop in the code, add the index manually with the GUI, and continue. That, plus the syntax errors I got when
I tried to convert back to the multiple field unsimplified version...

I ended up doing the whole job in straight VB without Access, using Scripting.Dictionary--in a tenth of the time I spent on the SQL version.

Dictionary version PDL for the curious:
Code:
for all lines in all datafiles:
  extract Key from line
  if Key in dictionary then
     write line to duplicates list
  else
     write line to good list
     add key to dictionary
  end if

Again, thanks for the help. I learned a fair bit more about SQL in the process.

--
Wes Groleau
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top