a new twist on finding duplicates ? 1

groleau · Apr 12, 2006

There are many informative threads on finding duplicates.
So many that I don't know whether my particular issue was
in there--I gave up looking after reading about ten threads.

What I have is a huge collection of accounting entries. Account number and Facility together are supposed to be unique, but they are not.

However, I don't want to just eliminate the duplicates--I need to move them to another table.

I find it easy to do except for one detail: if the table
contains

A A B C D D D E F G

then I need the results to be

Dupes: A D D
Good: A B C D E F G

instead of

Dupes: A A D D D
Good: B C E F G

The latter is easy. I could use DISTINCT to shrink Dupes, and then merge it back into Good, but then I would lose the extras that I want to keep in Dupes.

I am very much a newbie to SQL--and it doesn't help that Jet SQL is *not* the language called SQL in my reference books.

DotNetGnat · Apr 12, 2006

there are many of doing this...here is one way...may not be the efficient way but give it a shot...

Code:

ok lets say your table has the following fields

recid|name|recdate
__________________
  1  | A  | 12/04/2006
  1  | A  | 12/03/2006
...
...

then

Select A.recid, A.name, A.recdate from mytable A
where A.recdate in ( Select top 1 recdate from
mytable B where A.recid=B.recid order by recdate desc)

this will give you the most recent record of any ID/Name and you will get the following output 

Good:  A B C D E F G

to get the dupes do the following...

Select A.recid, A.name, A.recdate from mytable A
where A.recdate not in ( select top 1 recdate from
mytable B where A.recid=B.recid order by recdate desc)
group by A.recid having count(A.recid)>1

this will give you the following:

Dupes: A D D

-DNG

Remou · Apr 12, 2006

There is this, too:
Delete Duplicates (Save One From Each Group)
faq701-5721

groleau · Apr 12, 2006

THANKS, "DotNetGnat" !

"TOP" is a new one on me--never saw it before, but it makes sense and I'll give it a try.

--
Wes Groleau

http://www.Parkview.com/

groleau · Apr 12, 2006

Hmmm, how did I miss that in the FAQs ?

I'll try it both ways.

thanks

--
Wes Groleau

http://www.Parkview.com/

groleau · Apr 12, 2006

Both versions ran over fifteen minutes before I got impatient and killed them.

The progress bar does move a hair every once in a while,
so I guess something is happening. Perhaps I should have added indexes.

Or maybe I screwed up the SQL trying to convert to my field names. Here is the query it was on when I stopped it the second time around:

Code:

    DoCmd.RunSQL _
       ("SELECT * INTO No_Dupes FROM Simplified A          " & _
        "WHERE  A.Who_Cares                                " & _
        "   IN  (SELECT Top 1 Who_Cares                    " & _
        "        FROM Simplified B                         " & _
        "        WHERE A.Must_Be_Unique = B.Must_Be_Unique " & _
        "        ORDER BY Who_Cares);                      ")

--
Wes Groleau

http://www.Parkview.com/

PHV · Apr 12, 2006

Perhaps something like this ?

Code:

DoCmd.RunSQL( _
 "SELECT A.* INTO No_Dupes                         " & _
 "FROM Simplified A INNER JOIN (                   " & _
 "SELECT Must_Be_Unique, Max(Who_Cares) AS Keep_It " & _
 "FROM Simplified GROUP BY Must_Be_Unique          " & _
 ") B ON A.Must_Be_Unique = B.Must_Be_Unique       " & _
 " AND A.Who_Cares = B.Keep_It;")

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886

groleau · Apr 14, 2006

Thanks, _very_ much to everyone!

The version by 'DotNetGnat' worked in the simplified version after I removed the last "ORDER BY...HAVING". Jet didn't like them, but I couldn't quite understand the error messages. I believe the other versions were good, too, but I didn't let them finish because without an index, they were too slow.

Unfortunately, adding an index with

Code:

DoCmd.RunSQL ("CREATE INDEX ....;")

doesn't work. So I had to put a Stop in the code, add the index manually with the GUI, and continue. That, plus the syntax errors I got when
I tried to convert back to the multiple field unsimplified version...

I ended up doing the whole job in straight VB without Access, using Scripting.Dictionary--in a tenth of the time I spent on the SQL version.

Dictionary version PDL for the curious:

Code:

for all lines in all datafiles:
  extract Key from line
  if Key in dictionary then
     write line to duplicates list
  else
     write line to good list
     add key to dictionary
  end if

Again, thanks for the help. I learned a fair bit more about SQL in the process.

--
Wes Groleau

http://www.Parkview.com/

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

a new twist on finding duplicates ? 1

groleau

Programmer

DotNetGnat

Programmer

Remou

Technical User

groleau

Programmer

groleau

Programmer

groleau

Programmer

PHV

MIS

groleau

Programmer

Similar threads

Part and Inventory Search

Sponsor