Left outer Join query returns duplicates in SQL Server

14,327

I'd suggest reading on cartesian product.

If you have 50 rows in the first table and 70 in the second that makes 3500 rows. The join condition tbl1.SSN = tbl2.SSN will filter out rows but you may well end up with more than 50 rows.

Back to your problem you can see what is happening by trying the following :

SELECT 
  tbl1.*,
  (SELECT COUNT(*) FROM tbl2 WHERE tbl1.SSN = tbl2.SSN) AS NbResultTbl2
FROM 
  tbl1

This will tell which rows of tbl1 has multiple match in tbl2. If you have a number higher than 1 in the NbResultTbl2 column then you are going to end up with duplicates.

To eliminate those duplicates you can try this :

SELECT 
  tbl1.*,
  (SELECT TOP 1 StateCode FROM tbl2 WHERE tbl1.SSN = tbl2.SSN) 
FROM 
  tbl1  

This will get the first StateCode found for a matching SNN in tbl2.

Share:
14,327
sandr
Author by

sandr

Updated on June 04, 2022

Comments

  • sandr
    sandr almost 2 years

    I have a table 1 (MID, SSN, ...) MID is primary key and table 2 (ID, SSN, StateCode..) where ID and SSN make up the primary key. I'm trying to display all columns from table 1 along with StateCode from table 2 matching it against SSN. Tbl 1 has 50 rows and some have same SSN values.

    If no SSN match is found from table 2, displaying a NULL in StateCode is acceptable, so I chose left join. Here is my query

    Select 
        tbl1.*, tbl2.StateCode
    from 
        tbl1
    left outer join 
        tbl2 on tbl1.SSN = tbl2.SSN
    

    I'm looking to retrieve 50 records, but I get 70, rows that contain the same ssn value in tbl1 ends up duplicated in the final output. What is going wrong?

  • sandr
    sandr almost 10 years
    I tried this, and I still get 70 rows on doing the left outer join in this way, all the rows with same ssn gets duplicated more. I'm not sure why this is
  • sandr
    sandr almost 10 years
    When i do distinct im actually losing data its returning fewer rows.
  • sandr
    sandr almost 10 years
    query didn't work, grouping wasn't possible as the other columns didn't comply under this condition, i'm trying to select all of table 1 and one column from table 2
  • sandr
    sandr almost 10 years
    i want all the columns from table 1 along with statecode from table 2, when i do distinct i'm losing data, fewer than 50 rows are being returne
  • sandr
    sandr almost 10 years
    i tried reading through although i couldn't infer why a left join would retrieve more rows than what exists in tbl1, since ssn in table 2 is distinct
  • user3841709
    user3841709 almost 10 years
    I'm a little confused from your original question where you say displaying a null is acceptable. have you tried adding a Where condition such as WHERE ColumnName IS NOT NULL
  • sandr
    sandr almost 10 years
    NULL is acceptable as a value in StateCode if an ssn from table 1 is not found in table 2. I'm trying to show all 50 rows along with StateCode that match the SSN for that row
  • user3841709
    user3841709 almost 10 years
    Also if you are looking to retrieve 50 records, do your tables each have more than 50 records or does one have exactly 50 records? If you have exactly 50 in one table then it should be simple just figure out the correct joining
  • sandr
    sandr almost 10 years
    Yes table 1 has exactly 50 rows, and table 2 contains distinct SSN and the Statecode value. I'm not sure why my left join is retrieving duplicates and final result more than what is present in table 1
  • Gordon Linoff
    Gordon Linoff almost 10 years
    @sandr . . . Maybe select distinct will remove the rows that you think are duplicates.
  • Chris Latta
    Chris Latta almost 10 years
    Okay, try a nested select that returns only one state code for each SSN
  • ForguesR
    ForguesR almost 10 years
    See updated answer to check if ssn in table is really distinct.
  • sandr
    sandr almost 10 years
    Thank you, it was not distinct, I came up with another query but yours is short and clean! Very helpful