IF EXISTS before INSERT, UPDATE, DELETE for optimization
Solution 1
I'm not completely sure, but I get the impression that this question is really about upsert, which is the following atomic operation:
- If the row exists in both the source and target,
UPDATE
the target; - If the row only exists in the source,
INSERT
the row into the target; - (Optionally) If the row exists in the target but not the source,
DELETE
the row from the target.
Developers-turned-DBAs often naïvely write it row-by-row, like this:
-- For each row in source
IF EXISTS(<target_expression>)
IF @delete_flag = 1
DELETE <target_expression>
ELSE
UPDATE target
SET <target_columns> = <source_values>
WHERE <target_expression>
ELSE
INSERT target (<target_columns>)
VALUES (<source_values>)
This is just about the worst thing you can do, for several reasons:
It has a race condition. The row can disappear between
IF EXISTS
and the subsequentDELETE
orUPDATE
.It's wasteful. For every transaction you have an extra operation being performed; maybe it's trivial, but that depends entirely on how well you've indexed.
Worst of all - it's following an iterative model, thinking about these problems at the level of a single row. This will have the largest (worst) impact of all on overall performance.
One very minor (and I emphasize minor) optimization is to just attempt the UPDATE
anyway; if the row doesn't exist, @@ROWCOUNT
will be 0 and you can then "safely" insert:
-- For each row in source
BEGIN TRAN
UPDATE target
SET <target_columns> = <source_values>
WHERE <target_expression>
IF (@@ROWCOUNT = 0)
INSERT target (<target_columns>)
VALUES (<source_values>)
COMMIT
Worst-case, this will still perform two operations for every transaction, but at least there's a chance of only performing one, and it also eliminates the race condition (kind of).
But the real issue is that this is still being done for each row in the source.
Before SQL Server 2008, you had to use an awkward 3-stage model to deal with this at the set level (still better than row-by-row):
BEGIN TRAN
INSERT target (<target_columns>)
SELECT <source_columns> FROM source s
WHERE s.id NOT IN (SELECT id FROM target)
UPDATE t SET <target_columns> = <source_columns>
FROM target t
INNER JOIN source s ON t.d = s.id
DELETE t
FROM target t
WHERE t.id NOT IN (SELECT id FROM source)
COMMIT
As I said, performance was pretty lousy on this, but still a lot better than the one-row-at-a-time approach. SQL Server 2008, however, finally introduced MERGE syntax, so now all you have to do is this:
MERGE target
USING source ON target.id = source.id
WHEN MATCHED THEN UPDATE <target_columns> = <source_columns>
WHEN NOT MATCHED THEN INSERT (<target_columns>) VALUES (<source_columns>)
WHEN NOT MATCHED BY SOURCE THEN DELETE;
That's it. One statement. If you're using SQL Server 2008 and need to perform any sequence of INSERT
, UPDATE
and DELETE
depending on whether or not the row already exists - even if it's just one row - there is no excuse not to be using MERGE
.
You can even OUTPUT
the rows affected by a MERGE
into a table variable if you need to find out afterward what was done. Simple, fast, and risk-free. Do it.
Solution 2
That is not useful for just one update/delete/insert.
Possibly adds performance if several operators after if condition.
In last case better write
update a set .. where ..
if @@rowcount > 0
begin
..
end
Solution 3
IF EXISTS
will basically do a SELECT - the same one that UPDATE would.
As such, it will decrease performance - if there's nothing to update, you did the same amount of work (UPDATE would have queried same lack of rows as your select) and if there's something to update, you juet did an un-needed select.
Solution 4
You should not do it for UPDATE
and DELETE
, as if there is impact on performance, it is not a positive one.
For INSERT
there might be situations where your INSERT
will raise an exception (UNIQUE CONSTRAINT
violation etc), in which case you might want to prevent it with the IF EXISTS
and handle it more gracefully.
Solution 5
Neither
UPDATE … IF (@@ROWCOUNT = 0) INSERT
nor
IF EXISTS(...) UPDATE ELSE INSERT
patterns work as expected under high concurrency. Both may fail. Both may fail very frequently. MERGE is the king - it holds up much better.Let us do some stress testing and see for ourselves.
Here is the table we shall be using:
CREATE TABLE dbo.TwoINTs
(
ID INT NOT NULL PRIMARY KEY,
i1 INT NOT NULL ,
i2 INT NOT NULL ,
version ROWVERSION
) ;
GO
INSERT INTO dbo.TwoINTs
( ID, i1, i2 )
VALUES ( 1, 0, 0 ) ;
IF EXISTS(…) THEN pattern frequently fails under high concurrency.
Let us insert or update rows in a loop using the following simple logic: if a row with given ID exists, update it, and otherwise insert a new one. The following loop implements this logic. Cut and paste it into two tabs, switch into text mode in both tabs, and run them simultaneously.
-- hit Ctrl+T to execute in text mode
SET NOCOUNT ON ;
DECLARE @ID INT ;
SET @ID = 0 ;
WHILE @ID > -100000
BEGIN ;
SET @ID = ( SELECT MIN(ID)
FROM dbo.TwoINTs
) - 1 ;
BEGIN TRY ;
BEGIN TRANSACTION ;
IF EXISTS ( SELECT *
FROM dbo.TwoINTs
WHERE ID = @ID )
BEGIN ;
UPDATE dbo.TwoINTs
SET i1 = 1
WHERE ID = @ID ;
END ;
ELSE
BEGIN ;
INSERT INTO dbo.TwoINTs
( ID, i1, i2 )
VALUES ( @ID, 0, 0 ) ;
END ;
COMMIT ;
END TRY
BEGIN CATCH ;
ROLLBACK ;
SELECT error_message() ;
END CATCH ;
END ;
When we run this script simultaneously in two tabs, we shall immediately get a huge amount of primary key violations in both tabs. This demonstrates how unreliable the IF EXISTS pattern is when it executes under high concurrency.
Note: this example also demonstrates that it is not safe to use SELECT MAX(ID)+1 or SELECT MIN(ID)-1 as the next available unique value if we do it under concurrency.
Ed Gomoliako
Updated on August 31, 2020Comments
-
Ed Gomoliako over 3 years
There is quite often situation when you need to execute INSERT, UPDATE or DELETE statement based on some condition. And my question is whether the affect on the performance of the query add IF EXISTS before the command.
Example
IF EXISTS(SELECT 1 FROM Contacs WHERE [Type] = 1) UPDATE Contacs SET [Deleted] = 1 WHERE [Type] = 1
What about INSERTs or DELETEs?