Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Query hangs, subquery runs fine..

Status
Not open for further replies.

GoodFood

Technical User
Jan 30, 2007
4
US
Hello-

Using Access2000...
I have a table of transactions that are imported from an external source. Here's the table structure:

Transactions
-transaction_id (PK)
-cust_number
-group_id
-individual_id
-individual_name
-transaction_date

In certain situations, there will be duplicate transactions listed in the external file. These are imported into the Transactions table and are identical in every way except for the 'transaction_id' field. I'm trying to write a DELETE query to get rid of the duplicate lines. I wrote this:

DELETE *
FROM Transactions
WHERE transaction_id IN
(SELECT MAX(transaction_id) AS dup_id
FROM Transactions
GROUP BY cust_number, group_id, individual_id,
individual_name, transaction_date
HAVING COUNT(transaction_id) > 1);

The subquery that finds the dup_id runs just fine.. it returns 272 out of about 25,000 transaction_ids in about 3 seconds. But when I try to run the query as written, it hangs. I let it run for about an hour and then gave up.

If I use the subquery to load the 272 dup_ids into a temporary table, and run:
DELETE *
FROM Transactions
WHERE transaction_id IN (SELECT dup_id FROM tmpTable);

..it works just fine.


I was able to work around the problem fairly easily, but I was hoping someone could help me understand why this doesn't work. It seems to me like if the stand-alone subquery returns the list of transaction_ids with ease, then the outer DELETE query should have no problem.

Thanks for your time.
 
I am running A2003 (A2000 file format) and your sql ran without error for me. Of course I didn't try it with the number of transactions you are looking at but I used a table set up the way yours is and copied your SQL exactly.
Are you doing anything else at the time. Where are you running this from? An event procedure?
Try running it against a small number of sample records and see if it makes a difference.

Paul
 
Thanks for your responses.

I reduced my Transaction table to about 5,000 records, the subquery returns only 2 dup_ids, and the query took about 5 minutes to run when I was not doing anything else. That still seems like a really long time for the amount of data, but I guess it will eventually work if I give it enough time.

transaction_id is indexed as the PrimaryKey. I'm running this directly from the query window, although eventually it would probably be run as part of an event procedure.

Maybe I'm going about this all wrong.. is there a better way to do what I'm trying to do? I'm trying to keep it in SQL as much as possible.

Thanks again.
 
And what about this ?
DELETE * FROM Transactions
WHERE transaction_id NOT IN
(SELECT MIN(transaction_id) FROM Transactions
GROUP BY cust_number, group_id, individual_id,
individual_name, transaction_date)

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Copy your Transactions table (structure) only.
In the copy create an unique index on (cust_number,group_id,individual_id,individual_name,transaction_date)
Run an append query from Transactions into the copy.

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
That seems to work quite well right now. My only concern with that method is that the Transaction table is going to be growing pretty quickly.. the 25,000 records that I'm working with is one month's data. The transaction table in their current database (which I'm attempting to redesign) currently has about 700,000 records, so the append query to the new table (not to mention deleting the old table) will be no small task.

I was eventually planning to record the import_date in the table, and then add another criteria to my where clause so I could operate only on the data from the import that was just completed, rather than the entire Transaction table.

I'd be willing to bet your way still outperforms my original delete query, though, even at 700,000 records.

I appreciate all your help.
 
In this case it is better to stay away from the IN clause, if you want to use a subquery then use EXISTS instead. Even better is to create a temporary table (derived table) and join to it. The temp table could be done as a separate query ahead of time or inline as below. Of course, there should be indexes on key fields. You should do both the EXISTS and temp table to compare the times. Here is an untested example.

Delete
From transactions as A
Inner Join
(SELECT MAX(transaction_id) AS dup_id,
count(*) as cnt
FROM Transactions
GROUP BY cust_number, group_id, individual_id,
individual_name, transaction_date
) as dups
On dups.transaction_id = A.transaction_id
Where cnt > 1

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top