Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Select and Delete return different number of records

Status
Not open for further replies.

scc

Programmer
Apr 30, 2001
218
US
Can someone please tell me why the Select Count(*) and the Delete transaction return a different number of records?

I would like to have some confidence in my numbers before I delete them.

Here's the statement I'm using:

Begin Tran
--Select Count (*) --returns 177,145 records
Delete c -- returns 137,992 records
From IncurredHCFAClaims c
LEFT JOIN IncurredHCFAClaims c2
ON c.PatID=c2.PatID
AND c.Patient=c2.Patient
AND c.Diag1=c2.Diag1
AND c.Tax_ID=c2.Tax_ID
AND c.ServiceDate=c2.ServiceDate
AND c.TCharg=c2.TCharg
AND c.HCPCS=c2.HCPCS
AND c.Units=c2.Units
AND c.POS=c2.POS--177145

WHERE c.Claim_Number <c2.Claim_Number

Rollback Tran

Rollback Tran

Thanks!
 
DELETE displays number of rows deleted (in root join table - IncurredHCFAClaims). SELECT COUNT(*) displays number of rows in query. Because of JOIN, it can be larger than number of DELETEd rows.
 
So how can I be confident that I'm deleting the correct number of records?
 
Try something like:
Code:
select count( distinct primary_key_from_table_C )
from ...
Btw. be very careful with DELETE.. FROM... LEFT JOIN. Minor changes in WHERE clause may cause all data to go down the toilet.
 
Unfortunately, because of the nature of this data, there is no primary key... which is what I'm trying to do, clean it up (remove dups) before creating a report using the data.
 
Try running DBCC UPDATEUSAGE. See the BOL for the syntax. Then run your query.

SELECT COUNT(*) is weird. It doesn't always return the correct number of rows due to caching or something like that.

-SQLBill
 
You have some entries with more than one duplicate.

Consider a sample set of data:
Claim 1, Visit 1
Claim 3, Visit 1
Claim 10, Visit 1

Selecting would return 3 rows that are identical visits and the left table's claim number is less than the right table's claim number:
1 - 3
1 - 10
3 - 30

Deleting will only return 2:
1
3

You could the following query to perform the same functionality if you want select and delete to match:

[tt]DELETE FROM IncurredHCFAClaims c
WHERE ClaimNumber NOT IN (
SELECT Max(ClaimNumber) FROM IncurredHCFAClaims
GROUP BY PatId, Patient, Diag1, Tax_ID, ServiceDate, HCPCS, Units, POS
)[/tt]

Make sure you include all fields that can distinguish one visit from another. What if a patient goes to a doctor twice in one day? ;-)

I came up with this query by formulating the problem in an English sentence:
"Delete every entry from the claims table that is not the highest claim number for that claim"

-E

P.S. I encourage you to avoid LEFT JOINS in delete clauses when possible.
 
Argh, I mistyped, it should have been:

Selecting would return 3 rows that are identical visits and the left table's claim number is less than the right table's claim number:
1 - 3
1 - 10
3 - 10
 
ESquared,

If the patient went to the same doctor on the same day for the same procedure, then we would have no way of distinguishing that and the record for that service would have to be removed. This is for data analysis, and not for claims payment, so we would be okay.

When you say avoid LEFT JOINs, what about INNER JOINS, or would this also be a dangerous situation. Is the way you listed the better way to go in general?

Also, please note, I am not wanting to delete at the claim level, but at the line (detail) level for each procedure.

Thanks

 
The thing about deleting with LEFT JOINS is just the nature of one-sided joins. You get ALL the records from the left side and only those that match from the right side. If your where clause has conditions on the right table, you're fine. But it can be slippery. You're doing a good thing by looking at the results of a select before you turn it into a delete.

Given that there can be more than one duplicate, I think the way I suggested is better.

There is actually another method using a LEFT JOIN and a derived table that might give better performance than my first idea, you'd have to try it to see. I just don't know how the optimizer handles subqueries

DELETE FROM IncurredHCFAClaims c LEFT JOIN (
SELECT Max(ClaimNumber) FROM IncurredHCFAClaims
GROUP BY PatId, Patient, Diag1, Tax_ID, ServiceDate, HCPCS, Units, POS) c2
ON c.ClaimNumber = c2.ClaimNumber
WHERE c2.ClaimNumber IS NULL

Here's another way but it is probably more expensive because I think the subquery has to be run once for each row in c:

DELETE FROM IncurredHCFAClaims c WHERE ClaimNumber NOT IN (
SELECT Max(ClaimNumber) FROM IncurredHCFAClaims c2 WHERE
c.PatID=c2.PatID
AND c.Patient=c2.Patient
AND c.Diag1=c2.Diag1
AND c.Tax_ID=c2.Tax_ID
AND c.ServiceDate=c2.ServiceDate
AND c.TCharg=c2.TCharg
AND c.HCPCS=c2.HCPCS
AND c.Units=c2.Units
AND c.POS=c2.POS)

In the first example I gave above, the query optimizer might run the subquery once and use its results for all rows, I don't know. In the derived table example it definitely only runs it once.
 
Just a general suggestion here but one thing you might want to do before doing any delete is to preceede the delete logic with a select query that mimics the delete logic. Store the @@rowcount from the select query in a local var. Then, wrap the delete logic around a transaction with begin tran. Now, do the delete with the same logic that was used for the select query, stuff the @@rowcount from the delete in another local var. Compare the two vars and if they don't hold the same value, rollback the delete transaction.

It might look something like this . . .

declare @SelectCount int,
@DeleteCount int

select *
from SomeTable (nolock)

select @SelectCount = @@rowcount

begin tran MyDeleteLogic

delete
from SomeTable

select @DeleteCount = @@RowCount

if @DeleteCount <> @SelectCount
begin
rollback tran MyDeleteLogic
end
else
begin
commit tran MyDeleteLogic
end


Piece 'o cake, Rocky!
 
It seems more important to me to examine the results of the SELECT statement with your eyes to be sure you've got it right... then change it to a DELETE. Every time you modify a delete statement, make it a select first and verify.

Simply comparing counts doesn't give you any protection on a mistake in your logic. The difference in counts is a function of multiple duplicate records and is not really a problem, as I showed above.
 
ESquared,

I would totally agree with your statement . . .

ESquared said:
It seems more important to me to examine the results of the SELECT statement with your eyes to be sure you've got it right

However, in the context of a stored procedure (which again, should have been eyeballed well before it was put into production) it is always better to include some level of error checking in your code.

The example I provided above seems to me to be the cleanest way of ensuring the same number of rows that are selected are in turn deleted -- through a stored procedure or otherwise.

Yes, the select query is going to be carefully scrutinized before even thinking about a delete. The example I provided above was just that -- an example.
 
I understand what you're saying and agree with you.

It's just that in this particular case the final query may return more rows on a select than a delete, and this is proper behavior...

You have a good point about adding some sort of error checking!
 
Thanks for all the help and discussion. This was very interesting.

Unfortunately this is way to huge to be able to visually check every record with a SELECT. But definitely spot checking!

And yes, with deletes one must alway be very careful. And as you mentioned, performance is important too.

I now have a much better understanding of what was happening with my numbers and other ways to go about this.



 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top