T-SQL Optimize DELETE of many records

11,701

Solution 1

The log problem is probably due to the number of records deleted in the trasaction, to make things worse the engine may be requesting a lock per record (or by page wich is not so bad)

The one big thing here is how you determine the records to be deleted, i'm assuming you use a datetime field, if so make sure you have an index on the column otherwise it's a sequential scan of the table that will really penalize your process.

There are two things you may do depending of the concurrency of users an the time of the delete

  1. If you can guarantee that no one is going to read or write when you delete, you can lock the table in exclusive mode and delete (this takes only one lock from the engine) and release the lock
  2. You can use batch deletes, you would make a script with a cursor that provides the rows you want to delete, and you begin transtaction and commit every X records (ideally 5000), so you can keep the transactions shorts and not take that many locks

Take a look at the query plan for the delete process, and see what it shows, a sequential scan of a big table its never good.

Solution 2

To expand on the batch delete suggestion, i'd suggest you do this far more regularly (every 20 seconds perhaps) - batch deletions are easy:

WHILE 1 = 1 
    BEGIN 
        DELETE TOP ( 4000 )
        FROM    YOURTABLE
        WHERE   YourIndexedDateColumn < DATEADD(MINUTE, -20, GETDATE()) 
        IF @@ROWCOUNT = 0 
            BREAK    
    END

Your inserts may lag slightly whilst they wait for the locks to release but they should insert rather than error.

In regards to your table though, a table with this much traffic i'd expect to see on a very fast raid 10 array / perhaps even partitioned - are your disks up to it? Are your transaction logs on different disks to your data files? - they should be

EDIT 1 - Response to your comment

TO put a database into SIMPLE recovery:

ALTER DATABASE Database Name SET RECOVERY='SIMPLE'

This basically turns off transaction logging on the given database. Meaning in the event of data loss you would need loose all data since your last full backup. If you're OK with that, well this should save a lot of time when running large transactions. (NOTE that as the transaction is running, the logging still takes place in SIMPLE - to enable the rolling back of the transaction).

If there are tables within your database where you cant afford to loose data you'll need to leave your database in FULL recovery mode (i.e. any transaction gets logged (and hopefully flushed to *.trn files by your servers maintenance plans). As i stated in my question though, there is nothing stopping you having two databases, 1 in FULL and 1 in SIMPLE. the FULL database would be fore tables where you cant afford to loose any data (i.e. you could apply the transaction logs to restore data to a specific time) and the SIMPLE database would be for these massive high-traffic tables that you can allow data loss on in the event of a failure.

All of this is relevant assuming your creating full (*.bak) files every night & flushing your log files to *.trn files every half hour or so).

In regards to your index question, it's imperative your date column is indexed, if you check your execution plan and see any "TABLE SCAN" - that would be an indicator of a missing index.

Your date column i presume is DATETIME with a constraint setting the DEFAULT to getdate()?

You may find that you get better performance by replacing that with a BIGINT YYYYMMDDHHMMSS and then apply a CLUSTERED index to that column - note however that you can only have 1 clustered index per table, so if that table already has one you'll need to use a Non-Clustered index. (in case you didnt know, a clustered index basically tells SQL to store the information in that order, meaning that when you delete rows > 20 minutes SQL can literally delete stuff sequentially rather than hopping from page to page.

Solution 3

Unfortunately for the purpose of this question and fortunately for the sake of consistency and recoverability of the databases in SQL server, putting a database into Simple recovery mode DOES NOT disable logging. Every transaction still gets logged before committing it to the data file(s), the only difference would be that the space in the log would get released (in most cases) right after the transaction is either rolled back or committed in the Simple recovery mode, but this is not going to affect the performance of the DELETE statement in one way or another.

Share:
11,701
gotqn
Author by

gotqn

Free Tibet From China Domination http://freetibet.org/about/10-facts-about-tibet

Updated on June 05, 2022

Comments

  • gotqn
    gotqn almost 2 years

    I have a table that can grew to millions records (50 millions for example). On each 20 minutes records that are older than 20 minutes are deleted.

    The problems is that if the table has so many records such deletion can take a lot of time and I want to make it faster.

    I can not do "truncate table" because I want to remove only records that are older than 20 minutes. I suppose that when doing the "delete" and filtering the information that need to be delete, the server is creating log file or something and this take much time?

    Am I right? Is there a way to stop any flag or option to optimize the delete, and then to turn on the stopped option?

  • gotqn
    gotqn over 11 years
    I have no access to the real server. I am testing this on local machine, then it is deployed. I will test the offer solution with index creation, maybe you could advice the best index that I can make? And how can I set this recovery mode into SIMPLE for my table?Also, what I will lose if I do this - I am not concerned if some information from this table will not be recovered on any kind of error.
  • Carlos Grappa
    Carlos Grappa over 11 years
    CLUSTERED INDEX is definitely the way to go here.
  • Lester S
    Lester S over 11 years
    Will it increase the speed if the DATEADD(MINUTE, -20, GETDATE()) value be in a variable? Thanks,
  • Carlos Grappa
    Carlos Grappa over 11 years
    The increase is marginal compared to the clustered index, transaction log and locking issues
  • NealWalters
    NealWalters almost 10 years
    We do a similar process for deletes, and I'm looking at improving it if possible. Would there be any advantage to doing a "BEGIN TRANSACTION" and "COMMIT" inside your sample loop. (The main database is in Full recovery mode.)