Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

SQL Server2005 - Need Help for Data Clean Up

Status
Not open for further replies.

ReportDr00

IS-IT--Management
Mar 2, 2007
194
US
Hi
Firstly i would like to thank all who assisted with my previous query.
I have another scenario and i would like to share it and get some thoughts on it.
I have a service table and description table. description is a code table and service table has field that points to description table like below
service table

service id,
description id
service date
serviceupdatedate


description table
description id
description

In description table, i have same descriptions for different id's and these id's were referenced in service table.

Service id description id servicedate svcupdatedate
1 1 1/1/09 3/1/09
2 2 5/10/09 7/1/09
3 3 7/19/09 8/10/09

description id description
1 office visit
2 flu shot visit
3 office visit

I would like to have only one description per descirption id and update reference in service table as well

results should be

Service id description id servicedate svcupdatedate
1 1 1/1/09 3/1/09
2 2 5/10/09 7/1/09
3 1 7/19/09 8/10/09

description id description
1 office visit
2 flu shot visit


really appreciate if you share your thoughts on accomplishing this.....

regards

armani
 
You will need to approach this in steps. With SQL, you cannot write a single statement that updates the data in one table (your service table) and removes data from another table (your description table).

Now, before we begin, it's important that you have a good backup of this database. What follows will update data in one table and delete rows from another. If you don't have a good backup, and you make a mistake.... ugh. Better to be safe. Make a backup now. Go ahead.... These words will still be here when you return.

Sometimes, the thought process is more important than the actual implementation. I think this is one of those times. After reading your description of the problem, the first thing I thought about was.... we're gonna eventually remove data from the description table. How will we decide what to delete, and what to keep? It probably doesn't matter which one we keep, but we need to be consistent, and we need to make sure we got it right. The first idea that came to me was to keep the duplicate with the smallest id number. (We'll worry about deleting stuff later, we're not ready for it yet). So, let's build a query that returns the data we want to keep.

[tt][blue]
Select Min(DescriptionId) As MinDescriptionId,
Description
From Description
Group By Description
[/blue][/tt]

If you run the query above, you will get a recordset showing each distinct description and the minimum DescriptionId. This will represent the data we ultimately want to keep. But... take another look. The data above shows us something else too. For each description associated with a service table row, the data above shows us the ID number that we want to use. For example, ID 3 in the service table really means "office visit", but in the data above, "office visit" has id = 1.

So, how do we get from here to there? Let's think about this for a minute. We know we can join the service table to the description table to show us the text description associated with a service. But... if we then join to the query above (based on the description) we can also get the 'new' id number.

Code:
Select Service.ServiceId, 
       Service.DescriptionId, 
       Description.Description,
       NewDescription.MinDescriptionId
From   Service
       Inner Join Description
         On Service.DescriptionId = Description.DescriptionId
       Inner Join (
         Select Description, Min(DescriptionId) As MinDescriptionId
         From   Description 
         Group By Description
         ) As NewDescription
         On Description.Description = NewDescription.Description

If you run the above query, you will see the description text, but you will also see the old id and the new id, side by side, in the same row. Now... if we can update the service table, replacing the old description id with the new one, we will be "half way there". The difficult part is behind us. All we have to do now is the actual update. If the previous query returned the correct results, it's easy changing this to an update statement.

Code:
Update Service
Set    Service.DescriptionId = NewDescription.MinDescriptionId
From   Service
       Inner Join Description
         On Service.DescriptionId = Description.DescriptionId
       Inner Join (
         Select Description, Min(DescriptionId) As MinDescriptionId
         From   Description 
         Group By Description
         ) As NewDescription
         On Description.Description = NewDescription.Description
[!]Where  Service.DescriptionId <> NewDescription.MinDescriptionId[/!]

Notice I added a little bit to the query at the end. There's no sense updating the data to be the same thing it already was. I mean... there must be some rows that don't get changed, right?

The only thing left to do is to delete those rows from the description table. There are several ways we can do this. We could use the derived table technique to do it. Or... we could delete rows from the description table where there are no matching rows in the service table. If this were my task, I would take the second approach because there could possibly be data in there (besides the stuff we're currently working on) that could be deleted.

For this... a simple left join with an is null check in the where clause should work well.

Code:
Delete Description
From   Description
       Left Join Service
         On Description.DescriptionId = Service.DescriptionId
Where  Service.DescriptionId Is NULL

I hope this makes sense. More importantly, I hope the thought process behind it also makes sense. Knowing how to approach a problem logically and methodically will help to improve your coding skills.


-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Thanks gmmastros, i will give it a try and i agree with you absolutely about getting the thought process right and improving the coding skills. I had the steps written down on paper but just not able to implement with SQL. I always had trouble putting my algorithms into query. Do you suggest anything that would imporve my query writing skills.

really appreciate it.

Armani
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top