There must be a way to delete data in SQL Server w/o overloading the log

10,133

Solution 1

Do the delete in chunks, but rather than trying to shrink the log between times, do log backups between the chunks (that is if you're in full recovery)

The problem is that the log is full and hence has to grow. If it's full, trying to shrink it is useless, there's no free space in the log to release to the OS. What you need to do instead is make the space inside the file available for reuse.

Since the DB is in simple recovery, run the delete in chunks with a CHECKPOINT command in between each chunk. You can't do log backups in Simple recovery

Here's sample code that does deletes without filling the log (in simple recovery). DO NOT wrap this in a custom transaction. That completely defeats the point of deleting in batches as the log can't be cleared until the entire transaction commits.

(SQL 2005 and above. For SQL 2000, remove TOP and use SET ROWCOUNT)

DECLARE @Done BIT
SET @Done = 0
WHILE @Done = 0
  BEGIN
    DELETE TOP (20000) -- reduce if log still growing
      FROM SomeTable WHERE SomeColumn = SomeValue 
    IF @@ROWCOUNT = 0 
      SET @Done = 1
    CHECKPOINT -- marks log space reusable in simple recovery
  END

To understand log management, take a look at this article - http://www.sqlservercentral.com/articles/64582/

Solution 2

One trick I have used depending on the size of the data I'm keeping vs. the amount I'm deleting is to:

  1. select all the "data to keep" into another table (just for temporary storage)

  2. truncate the original table

  3. insert all the data from the temp storage table back into the original

It works well if the amount you are keeping is smaller than what you are deleting.

A similar option if all the database files are on the same disk (data and logs) and the data to be deleted is about half of the data, would be to export the "data to keep" to a file on a separate drive using the bcp command line utility, then truncate and insert the data file with bcp again.

I've seen the DBAs take the database offline, backup the logs, turn off the logging and do it that way but that seems like a lot of hassle. :-)

Share:
10,133
dudeNumber4
Author by

dudeNumber4

Long time C# developer. Pretty heavy TSQL developer as well. Some front end too.

Updated on June 11, 2022

Comments

  • dudeNumber4
    dudeNumber4 about 2 years

    I need to delete a bunch of data, and don't have the disk space for the log to continue growing. Upon looking into the matter further, it looks like there isn't any way around this, but I thought I'd ask for sure; it's difficult for me to believe that something so simple is impossible.

    I tried looping, deleting in chunks, and calling shrinkfile on the log after each iteration. SQL Server just seems to ignore the shrinkfile command. Did the same with backup log (then deleting the backup file afterwards). Same thing - log just keeps on growing. The recovery model on the database I'm trying this on is simple - I thought that would make it easier, but it doesn't.

  • dudeNumber4
    dudeNumber4 almost 15 years
    Maybe I misunderstand, but even if the log is near empty at the start of the process, it gets way too big during the deletion. The problem is that the servers (both QA and production) are getting low on space. We've decided on a new strategy that will greatly reduce the amount of data we need to warehouse, but we need to get rid of the historical data first.
  • dudeNumber4
    dudeNumber4 almost 15 years
    Well, I doubt I have enough room for that either since that would be about the same as what the log is growing to (need to delete about half the data). As I understand it, with oracle you can just say "with nolog" and delete some data. I can't believe there's no way to do something so simple in SQL Server.
  • dudeNumber4
    dudeNumber4 almost 15 years
    When I read the docs on CHECKPOINT, it says this happens on a log backup with TRUNCATE_ONLY. backup log 'db_name' with TRUNCATE_ONLY has the same effect - squat. So everything (including the backup log statements) is rolled up into one gigantic transaction which is useless in my case. I need the effect of executing the 'go' command without losing the context of my loop (can't have 'go' in the middle of a loop).
  • GilaMonster
    GilaMonster almost 15 years
    You don't need GO in the middle of the loop. GO has nothing to do with transactions, it's a batch-terminator Just don't declare an explicit transaction and SQL will do each delete in its own transaction and allow the log to be truncated afterwards, by a checkpoint
  • GilaMonster
    GilaMonster almost 15 years
    Can you post the code you're using for 'chunking' the deletes, as well as the results of the following query for the DB in question SELECT name, log_reuse_wait_desc FROM sys.databases I've done this kind of batched deleting hundreds of times, so it can work.
  • Jen A
    Jen A almost 15 years
    Is there any reason why you can't set the recovery model for the database to BULK LOGGED or SIMPLE while you are deleting, then reset it to the normal state afterward?
  • dudeNumber4
    dudeNumber4 almost 15 years
    The query returns NOTHING every time I've run it. Following is pseudocode: declare cursor open cursor WHILE fetch BEGIN declare memory table of ids populate this table with ids of related rows to delete delete rows from 8 related tables where id in (select id from memory table) -- Neither below has any effect. All iterations of loop are run in one -- big transaction run shrinkfile on log / backup with TRUNCATE_ONLY delete from @ids fetch next from cursor end
  • dudeNumber4
    dudeNumber4 almost 15 years
    -- This formatted better? declare cursor; open cursor; WHILE fetch BEGIN declare memory table of ids; populate this table with ids of related rows to delete; delete rows from 8 related tables where id in (select id from memory table); -- Neither below has any effect. All iterations of loop are run in one -- big transaction run shrinkfile on log / backup with TRUNCATE_ONLY; delete from @ids; fetch next from cursor; end
  • GilaMonster
    GilaMonster almost 15 years
    Ok, the fact that you have the thing in one big transaction is the problem. In an explicit transaction the log cannot be truncated until the entire transaction commits, completely negating the point of deleting in chunks. I'm going to edit my original answer to include code that does work to do large deleted without killing the log
  • dudeNumber4
    dudeNumber4 almost 15 years
    I didn't mean that I wrapped my code in a transaction; I meant that it seemed as if SQL Server was implicitly wrapping my code in a transaction since it seemed as if it was never getting to the backup log statement (which, as you say, wouldn't have worked or would be ignored on a database with simple recovery). Tried your checkpoint approach and it worked; thanks! Will switch to backup with truncate_only in production. Seems even more problematic for SQL Server that truncate_only is deprecated and will be removed in future versions. Then what will be the solution for such a problem if any?
  • GilaMonster
    GilaMonster almost 15 years
    DO NOT use backup with truncate only in production. It breaks the recovery chain of the database and can leave you unable to restore to point-in-time if necessary. If the DB is in full recovery, it is in full recovery for a reason and that is point-in-time recovery. In production use log backups, to disk, or switch the DB to simple temporarily, back to full after and immediately take a full backup. Speak with your DBA on this, make sure they're happy (data recoverability is their job)