Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Wanet Telecoms Ltd on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Removing Duplicate Records 1

Status
Not open for further replies.

MrMystery

Technical User
Apr 24, 2002
39
GB
Hi.
I have a table containing project numbers which are numbered as follows
1000N
1000S
1000T
1000X
through to
9999N
9999S
9999T
9999X

The suffix letter indicates a department of the company.
We have changed our reporting requirements to dispense with individual departments, and instead to report under one heading 'H'
I have therefore created a whole new list of projects
1000H
through to
9999H
based on existing project numbers. Obviously this has left me with duplicated numbers e.g. 4 occurrences of 1000H etc.
Can anyone help with a SQL script which will delete the duplicates, leaving me with just 1 occurrence of each project?
Any help appreciated!
 
Are all the other columns for each duplicate project number the same? If so, use SELECT DISTINCT * to copy each unique project to a new table, truncate the original table and copy the data back in.

If not, how do you know which project to keep out of the four? --James
 
All of the columns except one are the same.
The one that may be different is the project manager assigned to the project.
However, we are taking the view that once we are left with only 1 occurrence of each project, these will be printed out for checking by each project manager, and any errors will be corrected then.
Therefore, I am not concerned with which record to keep of the 4, only with deleting 3 and keeping 1.
Hope this makes sense!
 
OK, if it doesn't matter which manager you keep, use this query for getting the distinct projects:

Code:
SELECT projectnumber, <other columns>, MAX(manager)
FROM table
GROUP BY projectnumber, <other columns>
--James
 
Here is a tip from a book I have.
The technique is to create a table with a similiar structure
to the result set, put an index on it with the ignoredups clause like this:

CREATE TABLE [dbo].[ProviderNoDups] (
[ID] [numeric](18, 0) IDENTITY (1, 1) NOT NULL ,
[PhyID] [char] (9) NULL ,
[PhySSNEIN] [char] (1) NULL ,
[PhyLast] [char] (30) NULL ,
[PhyFirst] [char] (20) NULL ,
[PhyMI] [char] (1) NULL ,
[PhyComp] [char] (35) NULL ,
[PhyAddress] [char] (40) NULL ,
[PhyCity] [char] (25) NULL ,
[PhyState] [char] (2) NULL ,
[PhyZip] [char] (5) NULL ,
[Phy9Zip] [char] (4) NULL ,
[PhyPhone] [char] (10) NULL ,
[PhySpec] [char] (4) NULL ,
[msrepl_synctran_ts] [timestamp] NOT NULL
) ON [PRIMARY]

-- make this table ignore duplicate value --
CREATE UNIQUE INDEX removedups ON ProviderNoDups
(PhyID,PhySSNEIN,PhyLast,PhyFirst,PhyMI,PhyComp,

PhyAddress,PhyCity,PhyState,PhyZip,Phy9Zip,PhyPhone,PhySpec)
WITH IGNORE_DUP_KEY

Then insert your values into the new table. Duplicates will be left out.

Sorry for the lengthy code: I am using real info.

 
Brilliant! This worked a treat!
Many thanks.

Mike
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top