How to speed up simple UPDATE query with millions of rows?

sql sql-server tsql sql-server-2012

12,825

Solution 1

In my experience, looping your update so that it acts on small a numbers of rows each iteration is a good way to go. The ideal number of rows to update each iteration is largely dependent on your environment and the tables you're working with. I usually stick around 1,000 - 10,000 rows per iteration.

Example

SET ROWCOUNT 1000 -- Set the batch size (number of rows to affect each time through the loop).
WHILE (1=1) BEGIN

    UPDATE PF
    SET NewSourceId = 1
    FROM AA..Pub_ArticleFaculty PF WITH (NOLOCK)
            INNER JOIN AA2..ArticleAuthorOldNew AAON WITH (NOLOCK)  
                       ON AAON.OldFullSourceId = PF.SourceId
    WHERE NewSourceId IS NULL -- Only update rows that haven't yet been updated.

    -- When no rows are affected, we're done!
    IF @@ROWCOUNT = 0
        BREAK
END
SET ROWCOUNT 0 -- Reset the batch size to the default (i.e. all rows).
GO

Solution 2

If you are resetting all or almost all of the values, then the update will be quite expensive. This is due to logging and the overhead for the updates.

One approach you can take instead is insert into a temporary table, then truncate, then re-insert:

select pf.col1, pf.col2,  . . . ,
       coalesce(aaon.NewSourceId, pf.sourceid) as SourceId
into temp_pf
from AA..Pub_ArticleFaculty PF LEFT JOIN
     AA2..ArticleAuthorOldNew AAON 
     on AAON.OldFullSourceId = PF.SourceId;

truncate table AA..Pub_ArticleFaculty;

insert into AA..Pub_ArticleFaculty
    select * from temp_pf;

Note: You should either be sure that the columns in the original table match the temporary table or, better yet, list the columns explicitly in the insert.

I should also note that the major benefit is when your recovery mode is simple or bulk-logged. The reason is that logging for the truncate, select into, and insert . . . select is minimal (see here). This savings on the logging can be very significant.

Solution 3

I would

Disable the index on PF.SourceId
Run the update
Then rebuild the index

I don't get the NOLOCK on the table you are updating

UPDATE PF
SET PF.SourceId = AAON.NewSourceId
FROM AA..Pub_ArticleFaculty PF 
INNER JOIN AA2..ArticleAuthorOldNew AAON WITH (NOLOCK)  
        ON AAON.OldFullSourceId = PF.SourceId 
       AND PF.SourceId <> AAON.NewSourceId

12,825

Author by

Andy

Updated on June 07, 2022

Comments

Andy almost 2 years
How can I speed up this rather simple UPDATE query? It's been running for over 5 hours!

I'm basically replacing SourceID in a table by joining on a new table that houses the Old and New IDs. All these fields are VARCHAR(72) and must stay that way.

Pub_ArticleFaculty table has 8,354,474 rows (8.3 million). ArticleAuthorOldNew has 99,326,472 rows (99.3 million) and only the 2 fields you see below.

There are separate non-clustered indexes on all these fields. Is there a better way to write this query to make it run faster?
```
UPDATE PF
        SET PF.SourceId = AAON.NewSourceId
    FROM AA..Pub_ArticleFaculty PF WITH (NOLOCK)
        INNER JOIN AA2..ArticleAuthorOldNew AAON WITH (NOLOCK)  
                   ON AAON.OldFullSourceId = PF.SourceId
```