How can I improve DELETE FROM performance on large InnoDB tables?

40,806

Solution 1

This solution can provide better performance once completed, but the process may take some time to implement.

A new BIT column can be added and defaulted to TRUE for "active" and FALSE for "inactive". If that's not enough states, you could use TINYINT with 256 possible values.

Adding this new column will probably take a long time, but once it's over, your updates should be much faster as long as you do it off the PRIMARY as you do with your deletes and don't index this new column.

The reason why InnoDB takes so long to DELETE on such a massive table as yours is because of the cluster index. It physically orders your table based upon your PRIMARY, first UNIQUE it finds, or whatever it can determine as an adequate substitute if it can't find PRIMARY or UNIQUE, so when one row is deleted, it now reorders your entire table physically on the disk for speed and defragmentation. So it's not the DELETE that's taking so long; it's the physical reordering after that row is removed.

When you create a fixed width column and update that instead of deleting, there's no need for physical reordering across your huge table because the space consumed by a row and table itself is constant.

During off hours, a single DELETE can be used to remove the unnecessary rows. This operation will still be slow but collectively much faster than deleting individual rows.

Solution 2

I had a similar scenario with a table with 2 million rows and a delete statement, which should delete around a 100 thousand rows - it took around 10 minutes to do so.

After I checked the configuration, I found that MySQL Server was running with default innodb_buffer_pool_size = 8 MB (!).

After restart with innodb_buffer_pool_size = 1.5GB, the same scenario took 10 sec.

So it looks like there is a dependency if "reordering of the table" can fit in buffer_pool or not.

Solution 3

I have an InnoDB table with around 200 million rows and I did experience the same issue. Deleting rows took forever.

There are a primary key, a unique key and multiple compound indexes on the table.

When deleting in smaller chunks it went pretty fast, so I decided to make a stored procedure that simply deleted the rows in multiple iterations with a limit. Kind of like Jan Larsen's answer, but with no need for separate table.

That made it possible to delete large chunks of data (around 500K rows) within a few minutes.

It seems like the transaction that InnoDB has to make to be able to rollback the changes on errors are too big, and therefor cannot fit into memory, which is causing the delete to perform very bad.

The procedure:

CREATE DEFINER=`root`@`%` PROCEDURE `delete_rows`()
BEGIN
    declare v_max int unsigned default 100;
    declare v_counter int unsigned default 1;

        while v_counter < v_max do
            DELETE from items where a = 'A' AND b = 'B' AND c = 'C' LIMIT 10000;
            set v_counter=v_counter+1;
        end while;
END

Then call it by:

CALL delete_rows();

The where sentence matches a compound index starting with a,b,c-columns, which I think is important, so that MySQL do not have to make a full table scan to match the rows.

Solution 4

I solved a similar problem by using a stored procedure, thereby improving performance by a factor of several thousand.

My table had 33M rows and several indexes and I wanted to delete 10K rows. My DB was in Azure with no control over innodb_buffer_pool_size.

For simplicity I created a table tmp_id with only a primary id field:

CREATE TABLE `tmp_id` (
    `id` bigint(20) NOT NULL DEFAULT '0',
    PRIMARY KEY (`id`)
)

I selected the set of ids I wanted to delete into tmp_id and ran delete from my_table where id in (select id from tmp_id); This did not complete in 12 hours, so I tried with only a single id in tmp_id and it took 25 minutes. Doing delete from my_table where id = 1234 completed in a few milliseconds, so I decided to try doing that in a procedure instead:

CREATE PROCEDURE `delete_ids_in_tmp`()
BEGIN
    declare finished integer default 0;
    declare v_id bigint(20);
    declare cur1 cursor for select id from tmp_id;
    declare continue handler for not found set finished=1;    
    open cur1;
    igmLoop: loop
        fetch cur1 into v_id;
        if finished = 1 then leave igmLoop; end if;
        delete from problematic_table where id = v_id;
    end loop igmLoop;
    close cur1;
END

Now call delete_ids_in_tmp(); deleted all 10K rows in less than a minute.

Share:
40,806

Related videos on Youtube

mpe
Author by

mpe

Evolutionary biologist, genomicist, data scientist, Linux power user

Updated on August 16, 2022

Comments

  • mpe
    mpe almost 2 years

    I have a fairly large InnoDB table which contains about 10 million rows (and counting, it is expected to become 20 times that size). Each row is not that large (131 B on average), but from time to time I have to delete a chunk of them, and that is taking ages. This is the table structure:

     CREATE TABLE `problematic_table` (
        `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT,
        `taxid` int(10) unsigned NOT NULL,
        `blastdb_path` varchar(255) NOT NULL,
        `query` char(32) NOT NULL,
        `target` int(10) unsigned NOT NULL,
        `score` double NOT NULL,
        `evalue` varchar(100) NOT NULL,
        `log_evalue` double NOT NULL DEFAULT '-999',
        `start` int(10) unsigned DEFAULT NULL,
        `end` int(10) unsigned DEFAULT NULL,
        PRIMARY KEY (`id`),
        KEY `taxid` (`taxid`),
        KEY `query` (`query`),
        KEY `target` (`target`),
        KEY `log_evalue` (`log_evalue`)
    ) ENGINE=InnoDB AUTO_INCREMENT=7888676 DEFAULT CHARSET=latin1;
    

    Queries that delete large chunks from the table are simply like this:

    DELETE FROM problematic_table WHERE problematic_table.taxid = '57';
    

    A query like this just took almost an hour to finish. I can imagine that the index rewriting overhead makes these queries very slow.

    I am developing an application that will run on pre-existing databases. I most likely have no control over server variables unless I make changes to them mandatory (which I would prefer not to), so I'm afraid suggestions that change those are of little value.

    I have tried to INSERT ... SELECT those rows that I don't want to delete into a temporary table and just dropping the rest, but as the ratio of to-delete vs. to-keep shifts towards to-keep, this is no longer a useful solution.

    This is a table that may see frequent INSERTs and SELECTs in the future, but no UPDATEs. Basically, it's a logging and reference table that needs to drop parts of its content from time to time.

    Could I improve my indexes on this table by limiting their length? Would switching to MyISAM help, which supports DISABLE KEYS during transactions? What else could I try to improve DELETE performance?

    Edit: One such deletion would be in the order of about one million of rows.

    • G-Nugget
      G-Nugget over 11 years
      It might not help as much as you need, but the data types could be improved. Does the id column really need to be a BIGINT? INT is half the size and goes up to 4 billion, well above your projected max. The query column is probably taking up most of the space. If it isn't always 32 characters, make it a VARCHAR. The index on query could also be limited if you don't really need it all.
    • mpe
      mpe over 11 years
      @G-Nugget: Thanks for your suggestions. I'm going to look into converting id to INT. Yes, the query column takes 32 chars, since these are SHA-256 hashes. I should probably limit its index, though. Most of the time, this column is used to identify rows; would it help if I used that as PRIMARY KEY (the id is not really used to look up records)?
    • jeremycole
      jeremycole over 11 years
      @G-Nugget Really, changing the id column type is the least of the concerns for this table schema. The table is wide enough and the indexed columns wide enough that the 4 byte difference will not substantially change performance.
    • jeremycole
      jeremycole over 11 years
      @mpe Could you clarify how many rows any given delete may be deleting? Thousands? Millions?
    • mpe
      mpe over 11 years
      @jeremycole: The deletions are in the order of about one million of rows. Updated the question.
    • inhan
      inhan over 11 years
      I'm wondering, did converting BIGINT into INT make any difference?
    • mpe
      mpe over 11 years
      @inhan: I'm going to test that on Monday, don't have access to the machine now.
    • Olaf Dietsche
      Olaf Dietsche over 11 years
      Maybe this is a similar problem, see stackoverflow.com/a/13742306/1741542
    • mpe
      mpe over 11 years
      @inhan: Converting BIGINT to INT did not make any noticeable difference.
    • mpe
      mpe over 11 years
      @OlafDietsche: Thanks for the link!
    • Rick James
      Rick James over 2 years
      My treatise on "big deletes": mysql.rjweb.org/doc.php/deletebig
    • mpe
      mpe over 2 years
      @RickJames thanks for adding that!
  • mpe
    mpe over 11 years
    Thanks for your answer and the explanation of the reason for this immense slowdown. Your idea with the "state" column is not bad, but would lead to a lot of old data that isn't used anymore. Space or downtime are not an issue, but why would I leave that junk in my database? :) Would it help InnoDB if there were no variable-length columns in that database, i.e., if I changed the varchar columns to char?
  • mpe
    mpe about 10 years
    Accepted late, but better than never ;)
  • brooNo
    brooNo almost 10 years
    INT column length has nothing to do with storage. INT(1) is still 4 bytes and you can store where maximum value of 2147483647 for SIGNED and 4294967295 for UNSIGNED.
  • user207421
    user207421 almost 8 years
    Can you provide a citation for this claim? It seems most improbable. If DELETE really did all this, nothing else would ever happen.
  • Rick James
    Rick James over 7 years
    "reorders your entire table physically on the disk for speed and defragmentation" -- That is false. An InnoDB row delete first goes into the "Change buffer"; this actually delays the work to do the delete. When the delete happens, removes the one row from the one block it is in, then considers merging the block with its neighbor. There is no massive reorg.
  • julio
    julio over 7 years
    Is this good practicte to increase buffer pool size that much tho ?
  • Anse
    Anse over 7 years
    Definitely. We have a large Magento database running, and innodb_buffer_pool_size set to 5G, a quarter of the whole RAM size on that server. Most other MySQL variables don't affect performance as much as this one, so throw every byte of RAM you have there.
  • borjab
    borjab over 4 years
    Does this assume autocommit to true? or at the end? P.S: in some IDEs you may need to use the delimiter keyword. dev.mysql.com/doc/connector-net/en/…
  • Circle Hsiao
    Circle Hsiao over 4 years