Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Rhinorhino on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problems with Duplicate Records

Status
Not open for further replies.

lrdave36

Technical User
Joined
Jan 6, 2010
Messages
77
Location
US
Hey guys,

I have the following query:

Code:
SELECT   DISTINCT A.recip_ssn_nbr, A.PAYEE_NM, B.MBR_ADDR_TX, C.CITY_ID_NM, C.STATE_ID_CD, 
         C.POST_ZIP_CD, A.RECIP_RETIR_DT, A.RECIP_RETIR_DT2, 
         D.MBR_BIRTH_DT, 
        
        CASE WHEN E.CMPNT_TYPE_CD IN ('33', '34') THEN 'YES'  ELSE 'NO' END  AS PARTICIPATING
FROM     DSNP.PR01_T_RECIP_SYS A, 
         DSNP.PR01_T_MBR_ADDR B, 
         DSNP.PR01_T_MBR_CITY C, 
         DSNP.PR01_T_MBR D, 
         DSNP.PR01_T_PYMT_CMPNT E 
WHERE    A.RECIP_SSN_NBR=B.MBR_SSN_NBR 
AND      A.RECIP_SSN_NBR=C.MBR_SSN_NBR 
AND      A.RECIP_SSN_NBR=D.MBR_SSN_NBR 
AND      A.RECIP_SSN_NBR=E.RECIP_SSN_NBR 
AND      A.BENEF_STAT_CD = ('AC') 
AND      A.RECIP_TYPE_CD = '10' 
AND      A.AGTY_SYS_CD = 'ASPRS'
AND      E.ANTY_PYMT_DT = '2010-05-01'


Query runs just fine, but I was alarmed to see two records for each SSN. The duplication is occuring because the members are meeting both conditions in my CASE statement.

For example, this person's record in the DSNP.PR01_T_PYMT_CMPNT looks like this:


SSN ANTY_PYMT_DT CMPNT_TYPE_CD


33344555 2010-05-01 20
3334455 2010-05-01 33


Multiple records in the table for the same SSN, and the only difference in the record is the type code. So this person is both YES and NO.

I need to find a way that will return only one record as YES for this person.
 
try this --
Code:
SELECT A.recip_ssn_nbr
     , A.PAYEE_NM
     , B.MBR_ADDR_TX
     , C.CITY_ID_NM
     , C.STATE_ID_CD
     , C.POST_ZIP_CD
     , A.RECIP_RETIR_DT
     , A.RECIP_RETIR_DT2
     , D.MBR_BIRTH_DT
     , COALESCE(E.PARTICIPATING,'NO') AS PARTICIPATING
  FROM DSNP.PR01_T_RECIP_SYS A
INNER
  JOIN DSNP.PR01_T_MBR_ADDR B
    ON B.MBR_SSN_NBR = A.RECIP_SSN_NBR
INNER
  JOIN DSNP.PR01_T_MBR_CITY C
    ON C.MBR_SSN_NBR = A.RECIP_SSN_NBR
INNER
  JOIN DSNP.PR01_T_MBR D
    ON D.MBR_SSN_NBR = A.RECIP_SSN_NBR
LEFT OUTER
  JOIN ( SELECT RECIP_SSN_NBR
              , MAX(
                  CASE WHEN CMPNT_TYPE_CD IN ('33','34') 
                       THEN 'YES'  
                       ELSE 'NO' END
                   ) AS PARTICIPATING
           FROM PR01_T_PYMT_CMPNT
          WHERE ANTY_PYMT_DT = '2010-05-01'
         GROUP
             BY RECIP_SSN_NBR ) AS E
    ON E.RECIP_SSN_NBR = A.RECIP_SSN_NBR
 WHERE A.BENEF_STAT_CD = 'AC' 
   AND A.RECIP_TYPE_CD = '10' 
   AND A.AGTY_SYS_CD = 'ASPRS'
:-)

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon
 
Try this:

Code:
SELECT DISTINCT A.recip_ssn_nbr, A.PAYEE_NM, B.MBR_ADDR_TX, C.CITY_ID_NM, C.STATE_ID_CD,
       C.POST_ZIP_CD, A.RECIP_RETIR_DT, A.RECIP_RETIR_DT2,
       D.MBR_BIRTH_DT,
       CASE WHEN E.Participating = 1 THEN 'YES'  ELSE 'NO' END AS PARTICIPATING
FROM   DSNP.PR01_T_RECIP_SYS A	
       Inner Join DSNP.PR01_T_MBR_ADDR B
         On A.RECIP_SSN_NBR=B.MBR_SSN_NBR
       Inner Join DSNP.PR01_T_MBR_CITY C
         On A.RECIP_SSN_NBR=C.MBR_SSN_NBR
       Inner Join DSNP.PR01_T_MBR D
         On A.RECIP_SSN_NBR=D.MBR_SSN_NBR
       Inner Join (
         Select SSN, 
                Max(Case When CMPNT_TYPE_CD In ('33','34') Then 1 Else 0 End) As Participating
         From   DSNP.PR01_T_PYMT_CMPNT
         Where  ANTY_PYMT_DT = '2010-05-01'
         Group By SSN
         ) As E
         On A.RECIP_SSN_NBR=E.RECIP_SSN_NBR
WHERE    A.BENEF_STAT_CD = ('AC')
AND      A.RECIP_TYPE_CD = '10'
AND      A.AGTY_SYS_CD = 'ASPRS'

Please note that I changed your "style" of joins because inner joins statements are easier for me to read and understand. I do strongly encourage you to abandon the old-fashioned "comma join" syntax. It works perfectly fine for inner join but falls all over itself for other join types (like left, right, and full).

Notice how I basically created a separate query for your E table that uses a case/when coupled with a max. Since I group on the SSN, you are guaranteed to get just one row for each SSN for this query. If we find a row with type = 33 or 34, then a 1 is returned, otherwise a 0. We then group on the SSN and return the max. If a ssn has a 33 or 34, 1 will be returned. if all of the rows are not 33 or 34, then a 0 is returned.

I then take this query and make it a derived table. The derived table cannot return more than one row per ssn because of the group by. This should allow your query to work as you want it to.

Since I could not run the query, there may be minor syntax flaws, but give it a shot. If you have any questions, just ask.


-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
The duplication isn't because of the case statement but because there are multiple matching rows in PR01_T_PYMT_CMPNT. Firstly I'd suggest making the query ANSI compliant, (use JOINS), and make the relationship to PR01_T_PYMT_CMPNT a LEFT OUTER JOIN;
Code:
A.RECIP_SSN_NBR = E.RECIP_SSN_NBR AND E.CMPNT_TYPE_CD IN ('33', '34')

Then change the case statement to
Code:
CASE ISNULL(E.RECIP_SSN_NBR, 'XXX')
    WHEN 'XXX' THEN 'NO'
    ELSE 'YES'
END
...where 'XXX' is a known non existent value.

Rhys

"The trouble with having an open mind, of course, is that people will insist on coming along and trying to put things in it"
Terry Pratchett
 
Hey George,

I tested out your query, and it works great. Thanks for taking time to explain the concepts. This really helps a lot. I'm starting to realize that derived tables are going to be a big tool in my query writing.

I'll try to post any future code questions with the inner join format. I just rarely have to use other types of joins so the old fashion way works for me.
 
You're welcome.

I'll try to post any future code questions with the inner join format. I just rarely have to use other types of joins so the old fashion way works for me.

I appreciate that. But you should realize that it's not just for my benefit, it's for yours too. In my opinion, the join syntax is easier to use, easier to read, and easier to understand.

I suppose you could say that it's a "mind set" for approaching queries. I see queries in multiple parts. There's the select clause which identifies the columns you want returned. There's the from clause which identifies the tables AND their relationships with other tables. And finally there is a where clause which allows you to filter out certain rows from the result.

With the comma syntax, you are basically mixing the table relationships with the filter criteria.

By putting the table relationships in the from clause, it makes the query easier to read and understand by separating the different parts of the query. Looking at your original code and then mine, the filter criteria in the where clause applies just to the data, making it easier to see what is filtered. In your version, you have to ignore the table relationship part.

Also....

When you use the comma syntax, you are more likely to miss table relationships. What I mean is... if you accidentally forget to put one of your table relationship where clause criteria in to the query, it will still run, but you will end up with tons of rows that you don't want. When you use the inner/left/right/full join syntax, and you accidentally forget the on clause (which identifies the table relationships), you will get a syntax error. I prefer syntax errors over "possibly returning the wrong data".

Finally, please understand that I am not saying your way is wrong. All I'm trying to do is to explain WHY I prefer the Join syntax. I'm saying all of this because I truly believe it will help you write better queries.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Good points, George. Actually I wasn't exactly sure how to perform multiple inner joins, but your query demonstrates this well.
 
and i think my LEFT OUTER JOIN might be a good idea as well, particularly since the derived table has a WHERE clause which could easily result in no returns

r937.com | rudy.ca
Buy my new book Simply SQL from Amazon
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top