Avoid duplicates in INSERT INTO SELECT query in SQL Server

386,883

Solution 1

Using NOT EXISTS:

INSERT INTO TABLE_2
  (id, name)
SELECT t1.id,
       t1.name
  FROM TABLE_1 t1
 WHERE NOT EXISTS(SELECT id
                    FROM TABLE_2 t2
                   WHERE t2.id = t1.id)

Using NOT IN:

INSERT INTO TABLE_2
  (id, name)
SELECT t1.id,
       t1.name
  FROM TABLE_1 t1
 WHERE t1.id NOT IN (SELECT id
                       FROM TABLE_2)

Using LEFT JOIN/IS NULL:

INSERT INTO TABLE_2
  (id, name)
   SELECT t1.id,
          t1.name
     FROM TABLE_1 t1
LEFT JOIN TABLE_2 t2 ON t2.id = t1.id
    WHERE t2.id IS NULL

Of the three options, the LEFT JOIN/IS NULL is less efficient. See this link for more details.

Solution 2

In MySQL you can do this:

INSERT IGNORE INTO Table2(Id, Name) SELECT Id, Name FROM Table1

Does SQL Server have anything similar?

Solution 3

I just had a similar problem, the DISTINCT keyword works magic:

INSERT INTO Table2(Id, Name) SELECT DISTINCT Id, Name FROM Table1

Solution 4

I was facing the same problem recently...
Heres what worked for me in MS SQL server 2017...
The primary key should be set on ID in table 2...
The columns and column properties should be the same of course between both tables. This will work the first time you run the below script. The duplicate ID in table 1, will not insert...

If you run it the second time, you will get a

Violation of PRIMARY KEY constraint error

This is the code:

Insert into Table_2
Select distinct *
from Table_1
where table_1.ID >1

Solution 5

From SQL Server you can set a Unique key index on the table for (Columns that needs to be unique)

From sql server right click on the table design select Indexes/Keys

Select column(s) that will be not duplicate , then type Unique Key

Share:
386,883
Ashish Gupta
Author by

Ashish Gupta

Cloud Security Engineering and Operations guy at LPL Financial. Blog :- http://guptaashish.com LinkedIn Profile :- www.linkedin.com/in/ashishrgupta

Updated on April 13, 2021

Comments

  • Ashish Gupta
    Ashish Gupta about 3 years

    I have the following two tables:

    Table1
    ----------
    ID   Name
    1    A
    2    B
    3    C
    
    Table2
    ----------
    ID   Name
    1    Z
    

    I need to insert data from Table1 to Table2. I can use the following syntax:

    INSERT INTO Table2(Id, Name) SELECT Id, Name FROM Table1
    

    However, in my case, duplicate IDs might exist in Table2 (in my case, it's just "1") and I don't want to copy that again as that would throw an error.

    I can write something like this:

    IF NOT EXISTS(SELECT 1 FROM Table2 WHERE Id=1)
    INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1 
    ELSE
    INSERT INTO Table2 (Id, name) SELECT Id, name FROM Table1 WHERE Table1.Id<>1
    

    Is there a better way to do this without using IF - ELSE? I want to avoid two INSERT INTO-SELECT statements based on some condition.

  • IDisposable
    IDisposable over 14 years
    Just a clarification on the NOT EXISTS version, you'll need a WITH(HOLDLOCK) hint or no locks will be taken (because there are no rows to lock!) so another thread could insert the row under you.
  • Duncan
    Duncan over 14 years
    Interesting, because I have always believed joining to be faster than sub-selects. Perhaps that is for straight joins only, and not applicable to left joins.
  • Ashish Gupta
    Ashish Gupta over 14 years
    +1 for educating me on this . Very nice syntax. Definitely shorter and better than the one I used. Unfortunately Sql server does not have this.
  • dburges
    dburges over 14 years
    Duncan, joining is often faster that subselects when they are correlated subqueries. If you have the subquery up in the select list a join will often be faster.
  • IamIC
    IamIC over 13 years
    Not totally true. When you create a unique index, you can set it to "ignore duplicates", in which case SQL Server will ignore any attempts to add a duplicate.
  • Kip
    Kip over 13 years
    Thanks! option 2 seems like it would be really inefficient. Unless the database is smart enough to know not to fetch the entire results of the subquery?
  • OMG Ponies
    OMG Ponies over 13 years
    @Kip: If you read the link I provided that compares the three options, you'd know that your perception is not correct on SQL Server. Could be different on other databases, but the columns compared being nullable or not makes a difference too.
  • tomash
    tomash over 12 years
    NOT EXISTS is especially useful with composite primary key, NOT IN won't work then
  • Drew Chapin
    Drew Chapin over 10 years
    Any ideas why I would I still get cannot insert duplicate key... using any of the above methods?
  • bvj
    bvj almost 10 years
    @druciferre Possibly a duplicate within the source being inserted.
  • Smack Jack
    Smack Jack almost 8 years
    And SQL Server still can't... pathetic.
  • FreeMan
    FreeMan over 7 years
    Unless I totally misunderstand you, this will work if you have duplicates in the set you're inserting from. It won't, however, help if the set you're inserting from might be duplicates of data already in the insert into table.
  • Andir
    Andir over 5 years
    Please don't do this. You're basically saying "whatever data I had is worthless, let's just insert this new data!"
  • Sacro
    Sacro over 5 years
    @Andir If for some reason "Table2" shouldn't getting dropped after the "INSERT" then use the other methods, but this is a perfectly valid way to achieve what the OP asked.
  • Ingus
    Ingus over 4 years
    So SQL Server still cant?
  • MC9000
    MC9000 about 4 years
    Valid, but certainly slower and potentially corrupting without a transaction. If you go this route, wrap in a TRANSaction.
  • Cheung
    Cheung over 3 years
    It doesn't response to alternate of INSERT INGORE INTO.
  • JoeJam
    JoeJam over 3 years
    And still can't