Very slow DELETE query

11,891

Solution 1

Add a Primary key to your table variables and watch them scream

DECLARE @IdList1 TABLE(Id INT primary Key not null)
DECLARE @IdList2 TABLE(Id INT primary Key not null)

because there's no index on these table variables, any joins or subqueries must examine on the order of 10,000 times 10,000 = 100,000,000 pairs of values.

Solution 2

SQL Server compiles the plan when the table variable is empty and does not recompile it when rows are added. Try

DELETE FROM @IdList1
WHERE Id IN (SELECT Id FROM @IdList2)
OPTION (RECOMPILE)

This will take account of the actual number of rows contained in the table variable and get rid of the nested loops plan

Of course creating an index on Id via a constraint may well be beneficial for other queries using the table variable too.

Solution 3

The tables in table variables can have primary keys, so if your data supports uniqueness for these Ids, you may be able to improve performance by going for

DECLARE @IdList1 TABLE(Id INT PRIMARY KEY)
DECLARE @IdList2 TABLE(Id INT PRIMARY KEY)

Solution 4

Possible solutions:

1) Try to create indices thus

1.1) If List{1|2}.Id column has unique values then you could define a unique clustered index using a PK constraint like this:

DECLARE @IdList1 TABLE(Id INT PRIMARY KEY);
DECLARE @IdList2 TABLE(Id INT PRIMARY KEY);

1.2) If List{1|2}.Id column may have duplicate values then you could define a unique clustered index using a PK constraint using a dummy IDENTITY column like this:

DECLARE @IdList1 TABLE(Id INT, DummyID INT IDENTITY, PRIMARY KEY (ID, DummyID) );
DECLARE @IdList2 TABLE(Id INT, DummyID INT IDENTITY, PRIMARY KEY (ID, DummyID) );

2) Try to add HASH JOIN query hint like this:

DELETE list1
FROM @IdList1 list1
INNER JOIN @IdList2 list2 ON list1.Id = list2.Id
OPTION (HASH JOIN);

Solution 5

You are using Table Variables, either add a primary key to the table or change them to Temporary Tables and add an INDEX. This will result in much more performance. As a rule of thumb, if the table is only small, use TABLE Variables, however if the table is expanding and contains a lot of data then either use a temp table.

Share:
11,891
hwcverwe
Author by

hwcverwe

Wilfred Verweij Bachelor of ICT since 2010 Mission Critical Engineer by Schuberg Philis Mainly .Net and Microsoft Azure oriented.

Updated on June 16, 2022

Comments

  • hwcverwe
    hwcverwe almost 2 years

    I have problems with SQL performance. For sudden reason the following queries are very slow:

    I have two lists which contains Id's of a certain table. I need to delete all records from the first list if the Id's already exists in the second list:

    DECLARE @IdList1 TABLE(Id INT)
    DECLARE @IdList2 TABLE(Id INT)
    
    -- Approach 1
    DELETE list1
    FROM @IdList1 list1
    INNER JOIN @IdList2 list2 ON list1.Id = list2.Id
    
    -- Approach 2
    DELETE FROM @IdList1
    WHERE Id IN (SELECT Id FROM @IdList2)
    

    It is possible the two lists contains more than 10.000 records. In that case both queries takes each more than 20 seconds to execute.

    The execution plan also showed something I don't understand. Maybe that explains why it is so slow: Queryplan of both queries

    I Filled both lists with 10.000 sequential integers so both list contained value 1-10.000 as starting point.

    As you can see both queries shows for @IdList2 Actual Number of Rows is 50.005.000!!. @IdList1 is correct (Actual Number of Rows is 10.000)

    I know there are other solutions how to solve this. Like filling a third list instaed of removing from first list. But my question is:

    Why are these delete queries so slow and why do I see these strange query plans?

  • Jodrell
    Jodrell almost 11 years
    Will it help having an index on @IdList1?
  • Charles Bretana
    Charles Bretana almost 11 years
    This is new to me. Can you clarify - The cacheplan initial compile would happen when the Delete statement is encountered, correct? Not when the table variables are declared ? I mean, the plan being compiled is for the Delete, not for the table variable declaration... If so, then at that point wouldn't the table variables be populated ? Also, if you don't mind, could you provide a reference ? I'd like to read up on this.
  • Martin Smith
    Martin Smith almost 11 years
    "Any joins or subqueries must examine on the order of 10,000 times 10,000 = 100,000,000 pairs of values." this is only true for nested loops. A hash or merge join would process each input once (though a merge join would also need a sort)
  • hwcverwe
    hwcverwe almost 11 years
    Unfortunately this is slow too. Same result and exact same query plan.
  • granadaCoder
    granadaCoder almost 11 years
    Are you forced to use @variable-tables, or can you try #temp tables?
  • Charles Bretana
    Charles Bretana almost 11 years
    @martin, I have not read that stuff for a while, so I've forgotten the rules, but Is it not choosing the nested loops because there's no index? To do the other looping algorithms doesn't it need an index to sort the values? Also, without an index, it still has to examine every pair of values - no matter what looping algorithm it uses to create them. - the exception being, as you note, a merge join, but there it has to presort them.
  • Martin Smith
    Martin Smith almost 11 years
    @CharlesBretana - No it can use hash or merge join as long as there is an equi join. Merge join will require sorting both inputs (as will creating an index) but once an index is created obviously it is potentially more useful as it will benefit other queries (so +1)
  • Martin Smith
    Martin Smith almost 11 years
    @CharlesBretana - There are some links and example code in my answer here
  • Charles Bretana
    Charles Bretana almost 11 years
    again, however, since the cacheplan is created for each statement, not for the entire batch or for a stored proc, does it create cache plans for every statement in a batch or in a procedure before it starts executing?
  • Martin Smith
    Martin Smith almost 11 years
    @CharlesBretana - It compiles all statements in a batch before executing it except if the statement references a non existent object and is marked for deferred compile. So in this case the DELETE statement is compiled when the table variables are empty. Then (due to OPTION (RECOMPILE)) it gets recompiled at the point of the DELETE and can take account of the actual number of rows after the table variables are populated.
  • hwcverwe
    hwcverwe almost 11 years
    Your answer and comments together with @MartinSmith was a huge improvement. Thanks!
  • hwcverwe
    hwcverwe almost 11 years
    Your answer and comments together with @CharlesBretana was a huge improvement. I desiced to accept Charels' answer because I cannot accept two answers ;). Thanks!
  • granadaCoder
    granadaCoder almost 11 years
    If you can use #temp tables, try the example in my response.
  • oleksii
    oleksii almost 11 years
    But what if OP needs to delete, like he/she said: I need to delete all records from the first list if the Id's already exists in the second list
  • Jodrell
    Jodrell almost 11 years
    @oleksii true, the OP indicates its a contrived example concerned with those two table variables and specifically deletetion. However, this may still be useful for another reader.