SQL Server - Merging large tables without locking the data

21,866

Solution 1

Change your front end to use NOLOCK or READ UNCOMMITTED when doing the selects.

You can't NOLOCK MERGE,INSERT, or UPDATE as the records must be locked in order to perform the update. However, you can NOLOCK the SELECTS.

Note that you should use this with caution. If dirty reads are okay, then go ahead. However, if the reads require the updated data then you need to go down a different path and figure out exactly why merging 3M records is causing an issue.

I'd be willing to bet that most of the time is spent reading data from the disk during the merge command and/or working around low memory situations. You might be better off simply stuffing more ram into your database server.

An ideal amount would be to have enough ram to pull the whole database into memory as needed. For example, if you have a 4GB database, then make sure you have 8GB of RAM.. in an x64 server of course.

Solution 2

I'm afraid that I've quite the opposite experience. We were performing updates and insertions where the source table had only a fraction of the number of rows as the target table, which was in the millions.

When we combined the source table records across the entire operational window and then performed the MERGE just once, we saw a 500% increase in performance. My explanation for this is that you are paying for the up front analysis of the MERGE command just once instead of over and over again in a tight loop.

Furthermore, I am certain that merging 1.6 million rows (source) into 7 million rows (target), as opposed to 400 rows into 7 million rows over 4000 distinct operations (in our case) leverages the capabilities of the SQL server engine much better. Again, a fair amount of the work is in the analysis of the two data sets and this is done only once.

Another question I have to ask is well is whether you are aware that the MERGE command performs much better with indexes on both the source and target tables? I would like to refer you to the following link:

http://msdn.microsoft.com/en-us/library/cc879317(v=SQL.100).aspx

Share:
21,866
Josh
Author by

Josh

A man, when confronted with a problem, said "I know! I'll use regular expressions!" Now he has two problems.

Updated on September 22, 2020

Comments

  • Josh
    Josh over 3 years

    I have a very large set of data (~3 million records) which needs to be merged with updates and new records on a daily schedule. I have a stored procedure that actually breaks up the record set into 1000 record chunks and uses the MERGE command with temp tables in an attempt to avoid locking the live table while the data is updating. The problem is that it doesn't exactly help. The table still "locks up" and our website that uses the data receives timeouts when attempting to access the data. I even tried splitting it up into 100 record chunks and even tried a WAITFOR DELAY '000:00:5' to see if it would help to pause between merging the chunks. It's still rather sluggish.

    I'm looking for any suggestions, best practices, or examples on how to merge large sets of data without locking the tables.

    Thanks

  • Josh
    Josh almost 14 years
    The nolock is not an option with the selects. The code uses nhibernate, and even after setting nhibernate to use no lock it still times out. We have added more RAM to the server and we ran the sql profilier and it did some enhancements that will supposedly increase efficiency by 41%. Thanks for your suggestions, Chris.
  • NotMe
    NotMe almost 14 years
    @Josh: The only reason for the timeouts while using NOLOCKs is if the server is simply hammered for resources and is doing a lot of disk access. In other words, it's IO bound due to lack of memory. The three things you need to do are: profiling and fixing the queries, adding RAM (you've done those), and potentially changing the drives to use something that is much faster. However, the biggest impact is going to come from the index updates and RAM.
  • Josh
    Josh almost 14 years
    I ended up dumping the merge statement in favor of explicit updates and inserts. When combining that with the splitting out of 1000 records at a time it's both faster and doesn't lock the page file like merge does. Thanks again for your suggestions. They did help us!
  • NotMe
    NotMe almost 14 years
    @Josh: Glad it helped lead you to a solution. I'm going to have to check our own merge usage to see if we have any similar issues on the mass updates we do. I haven't seen any issues yet, but one of our products is about to be scaled up by a factor of 10 so it might be good practice to test.
  • variable
    variable about 3 years
    Please can you confirm that - in a SQL MERGE statement we can use the NOLOCK on source table so that the source table is not locked?