what is the best way to delete millions of records in TSQL?

15,423

Solution 1

do it in batches of 5000 or 10000 instead if you need to delete less than 40% of the data, if you need more then dump what you want to keep in another table/bcp out, truncate this table and insert those rows you dumped in the other table again/bcp in

while @@rowcount > 0
begin
Delete Top (5000)
        From Table1 A
        Left Join Table2 B
        on A.Name ='XYZ' and
           B.sId = A.sId
        Left Join Table3 C
        on A.Name = 'XYZ' and
           C.sId = A.sId
end

Small example you can run to see what happens

CREATE TABLE #test(id INT)

INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)
INSERT #test VALUES(1)

WHILE @@rowcount > 0
BEGIN 
DELETE TOP (2) FROM #test

END 

Solution 2

One way to remove millions of records is to select the remaining records in new tables then drop the old tables and rename the new ones. You can choose the best way for you depending on the foreign keys you can eithe drop and recreate the foreign keys or truncate the data in the old tables and copy the selected data back.

If you need to delete just few records disregard this answer. This is if you actually want to DELETE millions of records.

Solution 3

One other method is to insert the data that you want to keep into another table say Table1_good. Once the is completed and verified: Drop Table1 then Rename Table1_good to Table1

Dirty way to do it but it works.

Solution 4

Using the top clause is more for improving concurrency and may actually make the code run slower.

One suggestion is to delete the data from a derived table: http://sqlblogcasts.com/blogs/simons/archive/2009/05/22/DELETE-TOP-x-rows-avoiding-a-table-scan.aspx

Share:
15,423
David
Author by

David

Updated on June 19, 2022

Comments

  • David
    David about 2 years

    I have a following table structre

    Table1       Table2        Table3
    --------------------------------
     sId          sId           sId
     name          x              y
      x1          x2             x3
    

    I want to remove all records from table1 that do not have a matching record in the table3 based on sId and if sId present in table2 then do not delete record from table1.Ther are about 20,15 and 10 millions records in table1,table2 & table3 resp. --I have done something like this

    Delete Top (3000000)
            From Table1 A
            Left Join Table2 B
            on A.Name ='XYZ' and
               B.sId = A.sId
            Left Join Table3 C
            on A.Name = 'XYZ' and
               C.sId = A.sId
    

    ((I have added index on sId But not on Name.)) But This takes a long time to remove records. Is there any better way to delete millions records? Thanks in advance.