Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

selecting records that are not quite distinct 1

Status
Not open for further replies.

swhitten

Technical User
Sep 3, 2002
191
US
I need to select all columns from a view, but I need to restrict the rows to distinct records. However, not every column is unique for the records that I consider to be duplicates. But for the records that I consider to be distinct, if I look at only six of the columns (not all), those records are distinct.

Example:
These six columns are identical for two of my records:
customer number
document number
document type
document date
original document amount
current document amount

However, these two columns are different for the two records:
due date
post date

Because the main six columns are duplicated, I want to eliminate one of the records and only keep one.

I know this is probably simple, but it is escaping me. Thanks in advance for help.
 
Code:
SELECT a.*
FROM SomeTable a
WHERE EXISTS 
	(SELECT 1 FROM SomeTable b
	WHERE a.CustomerNubmer = b.CustomerNumber
        AND a.DocumentNumber = b.DocumentNumber
        --etc., add rest of column matches
	AND a.DueDate <> b.DueDate
        AND a.PostDate <> b.PostDate)
 
I know this is probably simple

Not really.

Sometimes I wish the DISTINCT keyword didn't exist. Programmers are know to severely mis-use it.

"Got duplicates in your result set? No problem. Slap a DISTINCT on it."

It's exactly that kind of thinking that causes problems with data. In your situation, the solution does not involve DISTINCT at all. Instead, you need to determine which of the rows you want to return. There should be some sort of 'rule' for when this situation occurs. For example, you could decide to show the row with the oldest 'due date' or the newest 'post date'. Either way, you really should decide which one to show because the solution to this problem depends on it.

To get your results, it would be helpful to know what the primary key column(s) are for this view. I suspect it's a composite key on customer number and document number. I base this assumption on the column names, but I could be completely wrong, too.

Anyway, the key to solving this problem is to use a derived table/self join.

First, let's assume you want the row with the oldest due date. As such, let's write a query that returns the primary key and the corresponding oldest due date.

[tt][blue]
Select CustomerNumber,
DocumentNumber,
Min(DueDate) As MinDueDate
From YourView
Group By CustomerNumber,DocumentNumber
[/blue][/tt]

This will return a single row for each CustomerNumber/DocumentNumber combination. We can now make this become a derived table and join it back to the original table. This will effectively filter out all rows that do not match CustomerNumber, DocumentNumber, and MinDueDate. Meaning... if there are duplicate Customer Number/Document Number, ONLY the row that matches the MinDueDate will be returned.

Code:
Select YourView.*
From   YourView
       Inner Join (
         Select CustomerNumber,
                DocumentNumber,
                Min(DueDate) As MinDueDate
         From   YourView
         Group By CustomerNumber,DocumentNumber
         ) as MinDates
         On  YourView.CustomerNumber = MinDates.CustomerNumber
         And YourView.DocumentNumber = MinDates.DocumentNumber
         And YourView.DueDate = MinDates.MinDueDate

Notice how the original query is embedded in this one. When writing a query like this, the original query part is considered a derived table. Also notice how the derived table is inner joined to the original one based on all 3 columns. This is what effectively filters the 'other' rows that you do not want.

Make sense?

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top