SQL - Only select row that is not duplicated

15,297

Solution 1

First, you need to define what makes a row "first". I'll make up an arbitrary definition and you can change the SQL as you need to for what you want. For this example, I assume "first" to be the lowest value for MyField4 and if they are equal then the lowest value for MyField5. It also accounts for the possibility of all 5 columns being identical.

SELECT DISTINCT
     T1.MyField1,
     T1.MyField2,
     T1.MyField3,
     T1.MyField4,
     T1.MyField5
FROM
     MyTable T1
LEFT OUTER JOIN MyTable T2 ON
     T2.MyField1 = T1.MyField1 AND
     T2.MyField2 = T1.MyField2 AND
     T2.MyField3 = T1.MyField3 AND
     (
          T2.MyField4 > T1.MyField4 OR
          (
               T2.MyField4 = T1.MyField4 AND
               T2.MyField5 > T1.MyField5
          )
     )
WHERE
     T2.MyField1 IS NULL

If you also want to account for PKs that are not duplicated in the source table, but already exist in your destination table then you'll need to account for that too.

Solution 2

Not sure how you know which of row 2 and row 3 you want in the new table, but in mysql you can simply:

insert ignore into new_table (select * from old_table);

And the PK won't allow duplicate entries to be inserted.

Solution 3

What is your database? In Oracle you could say

SELECT FROM your_table
WHERE rowid in
(SELECT MIN(rowid)
 FROM your_table
 GROUP BY MyField1, MyField2, MyField3);

Note that it is somewhat uncertain which of the rows with the same PK will be considered "first". If you need to impose a specific order, you need to additionally sort on the other columns.

Solution 4

It depends on what you're looking for.

There's a big difference between using JOIN + WHERE NULL, NOT IN, and NOT EXISTS, including performance, which is more important with larger data sets.

(See NOT IN vs. NOT EXISTS vs. LEFT JOIN / IS NULL.)

The three methods shown in the linked article are pretty straightforward.

Share:
15,297
Renan Rodrigues
Author by

Renan Rodrigues

Updated on June 04, 2022

Comments

  • Renan Rodrigues
    Renan Rodrigues almost 2 years

    I need to transfer data from one table to another. The second table got a primary key constraint (and the first one have no constraint). They have the same structure. What I want is to select all rows from table A and insert it in table B without the duplicate row (if a row is0 duplicate, I only want to take the first one I found)

    Example :

    MyField1 (PK)   |   MyField2 (PK)   |   MyField3(PK)   |   MyField4   |   MyField5  
    
    ----------
    
    1               |   'Test'          |   'A1'           |   'Data1'    |   'Data1'  
    2               |   'Test1'         |   'A2'           |   'Data2'    |   'Data2'  
    2               |   'Test1'         |   'A2'           |   'Data3'    |   'Data3'  
    4               |   'Test2'         |   'A3'           |   'Data4'    |   'Data4'
    

    Like you can see, the second and third line got the same pk key, but different data in MyField4 and MyField5. So, in this example, I would like to have the first, second, and fourth row. Not the third one because it's a duplication of the second (even if MyField4 and MyField5 contain different data).

    How can I do that with one single select ?

    thx