Only inserting a row if it's not already there

40,070

Solution 1

What about the "JFDI" pattern?

BEGIN TRY
   INSERT etc
END TRY
BEGIN CATCH
    IF ERROR_NUMBER() <> 2627
      RAISERROR etc
END CATCH

Seriously, this is quickest and the most concurrent without locks, especially at high volumes. What if the UPDLOCK is escalated and the whole table is locked?

Read lesson 4:

Lesson 4: When developing the upsert proc prior to tuning the indexes, I first trusted that the If Exists(Select…) line would fire for any item and would prohibit duplicates. Nada. In a short time there were thousands of duplicates because the same item would hit the upsert at the same millisecond and both transactions would see a not exists and perform the insert. After much testing the solution was to use the unique index, catch the error, and retry allowing the transaction to see the row and perform an update instead an insert.

Solution 2

I added HOLDLOCK which wasn't present originally. Please disregard the version without this hint.

As far as I'm concerned, this should be enough:

INSERT INTO TheTable 
SELECT 
    @primaryKey, 
    @value1, 
    @value2 
WHERE 
    NOT EXISTS 
    (SELECT 0
     FROM TheTable WITH (UPDLOCK, HOLDLOCK)
     WHERE PrimaryKey = @primaryKey) 

Also, if you actually want to update a row if it exists and insert if it doesn't, you might find this question useful.

Solution 3

You could use MERGE:

MERGE INTO Target
USING (VALUES (@primaryKey, @value1, @value2)) Source (key, value1, value2)
ON Target.key = Source.key
WHEN MATCHED THEN
    UPDATE SET value1 = Source.value1, value2 = Source.value2
WHEN NOT MATCHED BY TARGET THEN
    INSERT (Name, ReasonType) VALUES (@primaryKey, @value1, @value2)

Solution 4

Firstly, huge shout out to our man @gbn for his contributions to the community. Can't even begin to explain how often I find myself following his advice.

Anyway, enough fanboy-ing.

To add slightly to his answer, perhaps "enhance" it. For those, like me, left feeling unsettled with what to do in the <> 2627 scenario (and no an empty CATCH is not an option). I found this little nugget from technet.

    BEGIN TRY
       INSERT etc
    END TRY
    BEGIN CATCH
        IF ERROR_NUMBER() <> 2627
          BEGIN
                DECLARE @ErrorMessage NVARCHAR(4000);
                DECLARE @ErrorSeverity INT;
                DECLARE @ErrorState INT;

                SELECT @ErrorMessage = ERROR_MESSAGE(),
                @ErrorSeverity = ERROR_SEVERITY(),
                @ErrorState = ERROR_STATE();

                    RAISERROR (
                        @ErrorMessage,
                        @ErrorSeverity,
                        @ErrorState
                    );
          END
    END CATCH

Solution 5

I don't know if this is the "official" way, but you could try the INSERT, and fall back to UPDATE if it fails.

Share:
40,070
Adam
Author by

Adam

Updated on August 24, 2022

Comments

  • Adam
    Adam over 1 year

    I had always used something similar to the following to achieve it:

    INSERT INTO TheTable
    SELECT
        @primaryKey,
        @value1,
        @value2
    WHERE
        NOT EXISTS
        (SELECT
            NULL
        FROM
            TheTable
        WHERE
            PrimaryKey = @primaryKey)
    

    ...but once under load, a primary key violation occurred. This is the only statement which inserts into this table at all. So does this mean that the above statement is not atomic?

    The problem is that this is almost impossible to recreate at will.

    Perhaps I could change it to the something like the following:

    INSERT INTO TheTable
    WITH
        (HOLDLOCK,
        UPDLOCK,
        ROWLOCK)
    SELECT
        @primaryKey,
        @value1,
        @value2
    WHERE
        NOT EXISTS
        (SELECT
            NULL
        FROM
            TheTable
        WITH
            (HOLDLOCK,
            UPDLOCK,
            ROWLOCK)
        WHERE
            PrimaryKey = @primaryKey)
    

    Although, maybe I'm using the wrong locks or using too much locking or something.

    I have seen other questions on stackoverflow.com where answers are suggesting a "IF (SELECT COUNT(*) ... INSERT" etc., but I was always under the (perhaps incorrect) assumption that a single SQL statement would be atomic.

    Does anyone have any ideas?

  • Martin Smith
    Martin Smith almost 14 years
    What are you locking when the row doesn't exist?
  • aelveborn
    aelveborn almost 14 years
    A relevant range in the index (the primary key in this case).
  • DaveWilliamson
    DaveWilliamson almost 14 years
    @GSerg Agreed. The pessimistic/optimistic locking of the select statement needs a directive.
  • Adam
    Adam almost 14 years
    Thanks. This makes sense to me too. Although when I originally wrote the statement I naively assumed that something magical would happen inside the server!
  • Vidar Nordnes
    Vidar Nordnes almost 14 years
    Why not just do: IF NOT EXISTS (SELECT * FROM TABLE WHERE param1field = @param1 AND param2field = @param2) BEGIN INSERT INTO Table(param1Field, param2Field) VALUES(param1, param2) END
  • Adam
    Adam almost 14 years
    Yeah, but that looks like it's open to concurrency issues (i.e. what if something happens on another connection between your select and your insert?)
  • Adam
    Adam almost 14 years
    Thanks - okay, I agree that this is probably what I will end up using, and is the answer to the actual question.
  • Adam
    Adam almost 14 years
    I know it's bad to rely on errors like this, but I wonder if doing this with just a straight INSERT (without the EXISTS) would perform better (i.e. try insert no matter what and just ignore error 2627).
  • aelveborn
    aelveborn almost 14 years
    Two IS's cause no conflict. Can you replace the IDs with object names?
  • aelveborn
    aelveborn almost 14 years
    The point is, two U's are not compatible. In your screenshot there are no U's, which is probably because you table doesn't have an index on the id column. The updlock places a KEY U-lock on that index.
  • aelveborn
    aelveborn almost 14 years
    I'm testing both cases, and there's always a U lock. Can you do the same little test on your side (see my edited answer)?
  • aelveborn
    aelveborn almost 14 years
    That depends on whether you mostly insert values that don't exist or mostly values that do exist. In the latter case, I'd argue the performance will be poorer due to tons of exceptions being raised and ignored.
  • Martin Smith
    Martin Smith almost 14 years
    The issue with your test (I think) is it is not the same as the OP's original situation. The OP only has one statement and the pattern of lock release seems to be the same regardless of hint for that. Without the hint the locks are released at the end of the statement and before the delay with the hint they are released at the end of the whole transaction. I definitely only see U locks when I am running it and the where id=4 bit matches a record. What are they on? a range?
  • aelveborn
    aelveborn almost 14 years
    I believe the trick is that IU is not compatible with U. The first connection will place an IU on the index page because the row does not exist yet, the second will have to place a U on the row, which will result in waiting.
  • Martin Smith
    Martin Smith almost 14 years
    I don't think there's anything stopping 2 concurrent transactions doing SELECT 0 FROM Table1 with (updlock) WHERE PrimaryKey = x as long as x doesn't exist (I just tested that in isolation and the 2 transactions didn't block each other) but I've bugged you enough about this now. Sorry!
  • aelveborn
    aelveborn almost 14 years
    No, keep bugging, I sometimes find myself feeling what is correct instead of knowing it. This is the case. Side note: SQL Server has an interesting effect where two transactions will not block each other even if each of them tries to get an exclusive lock on the same object, provided that no actual update occurs (a trick to improve concurrency). We should have a nice stress testing tomorrow.
  • ZygD
    ZygD almost 14 years
    @Gserg: correct. But then OP would have posted an INSERT/UPDATE question, not test for inert arguably. We use this to filter out a few thousand duplicates in dozen million new rows per day
  • aelveborn
    aelveborn almost 14 years
    The testing shows that two select 'Done.' where exists(select 0 from foo_testing with(updlock) where id = 4);, provided id=4 does not exist, don't conflict with each other, which means my original answer was actually wrong. The solution is to add the HOLDLOCK hint. See the edited answer. Thanks for keeping me bugged :)
  • Iain
    Iain over 13 years
    There's a great explanation of why this locking is required in Daniel's answer to my (very similar) question: stackoverflow.com/questions/3789287/…
  • ErikE
    ErikE over 13 years
    @Adam Marc's code above isn't any better for avoiding locking issues. The only two ways to handle concurrency issues are to lock using WITH (UPDLOCK, HOLDLOCK) or to handle the insert error and convert it to an update.
  • Iain
    Iain about 13 years
    In this case you can remove the 'WHEN MATCHED THEN' as Adam only needs to insert if missing, not upsert.
  • EBarr
    EBarr almost 13 years
    Sorry, but without adding hold lock hints to your merge statement, you will have the exact problem that the OP is concerned about.
  • ZygD
    ZygD almost 13 years
    late comment sorry. I forgot about using locks when I started with TRY/CATCH. Not sure what's best: decreasing concurrency or dealing with an exception. Anyway, +1 for thoroughness. Enjoy the badge ;-)
  • Martin Smith
    Martin Smith over 12 years
    See this article for more on @EBarr's point
  • EBarr
    EBarr over 12 years
    @MartinSmith - that is the exact article I read when I ran across the issue! Thanks for the reference.
  • gyozo kudor
    gyozo kudor about 8 years
    I don't understand this answer
  • Zameer Ansari
    Zameer Ansari about 8 years
    @gbn - Sorry for being too noob. What is the meaning of 35k tps? Also, how does it makes sure that only distinct records are added in the table? How does try - catch handle do this?
  • Jim
    Jim over 7 years
    @student 35k tps = "35000 transactions per second". The TRY CATCH prevents a duplicate entry from being inserted by catching the unique constraint violation error (error number 2627) and ignoring it. The CATCH will only rethrow the error if it is not 2627. There is an issue with this snippet because a unique index violation is error 2601. So you have to check for both of those codes. This solution also only works for single row INSERTs. If you try to INSERT from one table into another table, you need a different strategy.
  • Germán Martínez
    Germán Martínez over 7 years
    The MSDN documentation states (in performance tip) that you should use insert where not exists instead of merge unless complexity is required... msdn.microsoft.com/en-us/library/…
  • T0t3sMcG0t3s
    T0t3sMcG0t3s over 5 years
    This is exactly the bit that I was left directionless in the previous answer. +1 to the both of you!