How can I improve DELETE FROM performance on large InnoDB tables?
Solution 1
This solution can provide better performance once completed, but the process may take some time to implement.
A new BIT
column can be added and defaulted to TRUE
for "active" and FALSE
for "inactive". If that's not enough states, you could use TINYINT
with 256 possible values.
Adding this new column will probably take a long time, but once it's over, your updates should be much faster as long as you do it off the PRIMARY
as you do with your deletes and don't index this new column.
The reason why InnoDB takes so long to DELETE
on such a massive table as yours is because of the cluster index. It physically orders your table based upon your PRIMARY
, first UNIQUE
it finds, or whatever it can determine as an adequate substitute if it can't find PRIMARY
or UNIQUE
, so when one row is deleted, it now reorders your entire table physically on the disk for speed and defragmentation. So it's not the DELETE
that's taking so long; it's the physical reordering after that row is removed.
When you create a fixed width column and update that instead of deleting, there's no need for physical reordering across your huge table because the space consumed by a row and table itself is constant.
During off hours, a single DELETE
can be used to remove the unnecessary rows. This operation will still be slow but collectively much faster than deleting individual rows.
Solution 2
I had a similar scenario with a table with 2 million rows and a delete statement, which should delete around a 100 thousand rows - it took around 10 minutes to do so.
After I checked the configuration, I found that MySQL Server was running with default innodb_buffer_pool_size
= 8 MB (!).
After restart with innodb_buffer_pool_size
= 1.5GB, the same scenario took 10 sec.
So it looks like there is a dependency if "reordering of the table" can fit in buffer_pool or not.
Solution 3
I have an InnoDB table with around 200 million rows and I did experience the same issue. Deleting rows took forever.
There are a primary key, a unique key and multiple compound indexes on the table.
When deleting in smaller chunks it went pretty fast, so I decided to make a stored procedure that simply deleted the rows in multiple iterations with a limit. Kind of like Jan Larsen's answer, but with no need for separate table.
That made it possible to delete large chunks of data (around 500K rows) within a few minutes.
It seems like the transaction that InnoDB has to make to be able to rollback the changes on errors are too big, and therefor cannot fit into memory, which is causing the delete to perform very bad.
The procedure:
CREATE DEFINER=`root`@`%` PROCEDURE `delete_rows`()
BEGIN
declare v_max int unsigned default 100;
declare v_counter int unsigned default 1;
while v_counter < v_max do
DELETE from items where a = 'A' AND b = 'B' AND c = 'C' LIMIT 10000;
set v_counter=v_counter+1;
end while;
END
Then call it by:
CALL delete_rows();
The where sentence matches a compound index starting with a,b,c-columns, which I think is important, so that MySQL do not have to make a full table scan to match the rows.
Solution 4
I solved a similar problem by using a stored procedure, thereby improving performance by a factor of several thousand.
My table had 33M rows and several indexes and I wanted to delete 10K rows. My DB was in Azure with no control over innodb_buffer_pool_size.
For simplicity I created a table tmp_id
with only a primary id
field:
CREATE TABLE `tmp_id` (
`id` bigint(20) NOT NULL DEFAULT '0',
PRIMARY KEY (`id`)
)
I selected the set of ids I wanted to delete into tmp_id
and ran delete from my_table where id in (select id from tmp_id);
This did not complete in 12 hours, so I tried with only a single id in tmp_id
and it took 25 minutes. Doing delete from my_table where id = 1234
completed in a few milliseconds, so I decided to try doing that in a procedure instead:
CREATE PROCEDURE `delete_ids_in_tmp`()
BEGIN
declare finished integer default 0;
declare v_id bigint(20);
declare cur1 cursor for select id from tmp_id;
declare continue handler for not found set finished=1;
open cur1;
igmLoop: loop
fetch cur1 into v_id;
if finished = 1 then leave igmLoop; end if;
delete from problematic_table where id = v_id;
end loop igmLoop;
close cur1;
END
Now call delete_ids_in_tmp();
deleted all 10K rows in less than a minute.
Related videos on Youtube
mpe
Evolutionary biologist, genomicist, data scientist, Linux power user
Updated on August 16, 2022Comments
-
mpe almost 2 years
I have a fairly large InnoDB table which contains about 10 million rows (and counting, it is expected to become 20 times that size). Each row is not that large (131 B on average), but from time to time I have to delete a chunk of them, and that is taking ages. This is the table structure:
CREATE TABLE `problematic_table` ( `id` bigint(20) unsigned NOT NULL AUTO_INCREMENT, `taxid` int(10) unsigned NOT NULL, `blastdb_path` varchar(255) NOT NULL, `query` char(32) NOT NULL, `target` int(10) unsigned NOT NULL, `score` double NOT NULL, `evalue` varchar(100) NOT NULL, `log_evalue` double NOT NULL DEFAULT '-999', `start` int(10) unsigned DEFAULT NULL, `end` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id`), KEY `taxid` (`taxid`), KEY `query` (`query`), KEY `target` (`target`), KEY `log_evalue` (`log_evalue`) ) ENGINE=InnoDB AUTO_INCREMENT=7888676 DEFAULT CHARSET=latin1;
Queries that delete large chunks from the table are simply like this:
DELETE FROM problematic_table WHERE problematic_table.taxid = '57';
A query like this just took almost an hour to finish. I can imagine that the index rewriting overhead makes these queries very slow.
I am developing an application that will run on pre-existing databases. I most likely have no control over server variables unless I make changes to them mandatory (which I would prefer not to), so I'm afraid suggestions that change those are of little value.
I have tried to
INSERT ... SELECT
those rows that I don't want to delete into a temporary table and just dropping the rest, but as the ratio of to-delete vs. to-keep shifts towards to-keep, this is no longer a useful solution.This is a table that may see frequent
INSERT
s andSELECT
s in the future, but noUPDATE
s. Basically, it's a logging and reference table that needs to drop parts of its content from time to time.Could I improve my indexes on this table by limiting their length? Would switching to MyISAM help, which supports
DISABLE KEYS
during transactions? What else could I try to improveDELETE
performance?Edit: One such deletion would be in the order of about one million of rows.
-
G-Nugget over 11 yearsIt might not help as much as you need, but the data types could be improved. Does the
id
column really need to be aBIGINT
?INT
is half the size and goes up to 4 billion, well above your projected max. Thequery
column is probably taking up most of the space. If it isn't always 32 characters, make it aVARCHAR
. The index onquery
could also be limited if you don't really need it all. -
mpe over 11 years@G-Nugget: Thanks for your suggestions. I'm going to look into converting
id
toINT
. Yes, thequery
column takes 32 chars, since these are SHA-256 hashes. I should probably limit its index, though. Most of the time, this column is used to identify rows; would it help if I used that asPRIMARY KEY
(theid
is not really used to look up records)? -
jeremycole over 11 years@G-Nugget Really, changing the
id
column type is the least of the concerns for this table schema. The table is wide enough and the indexed columns wide enough that the 4 byte difference will not substantially change performance. -
jeremycole over 11 years@mpe Could you clarify how many rows any given delete may be deleting? Thousands? Millions?
-
mpe over 11 years@jeremycole: The deletions are in the order of about one million of rows. Updated the question.
-
inhan over 11 yearsI'm wondering, did converting
BIGINT
intoINT
make any difference? -
mpe over 11 years@inhan: I'm going to test that on Monday, don't have access to the machine now.
-
Olaf Dietsche over 11 yearsMaybe this is a similar problem, see stackoverflow.com/a/13742306/1741542
-
mpe over 11 years@inhan: Converting
BIGINT
toINT
did not make any noticeable difference. -
mpe over 11 years@OlafDietsche: Thanks for the link!
-
Rick James over 2 yearsMy treatise on "big deletes": mysql.rjweb.org/doc.php/deletebig
-
mpe over 2 years@RickJames thanks for adding that!
-
-
mpe over 11 yearsThanks for your answer and the explanation of the reason for this immense slowdown. Your idea with the "state" column is not bad, but would lead to a lot of old data that isn't used anymore. Space or downtime are not an issue, but why would I leave that junk in my database? :) Would it help InnoDB if there were no variable-length columns in that database, i.e., if I changed the
varchar
columns tochar
? -
mpe about 10 yearsAccepted late, but better than never ;)
-
brooNo almost 10 yearsINT column length has nothing to do with storage. INT(1) is still 4 bytes and you can store where maximum value of 2147483647 for SIGNED and 4294967295 for UNSIGNED.
-
user207421 almost 8 yearsCan you provide a citation for this claim? It seems most improbable. If DELETE really did all this, nothing else would ever happen.
-
Rick James over 7 years"reorders your entire table physically on the disk for speed and defragmentation" -- That is false. An InnoDB row delete first goes into the "Change buffer"; this actually delays the work to do the delete. When the delete happens, removes the one row from the one block it is in, then considers merging the block with its neighbor. There is no massive reorg.
-
julio over 7 yearsIs this good practicte to increase buffer pool size that much tho ?
-
Anse over 7 yearsDefinitely. We have a large Magento database running, and
innodb_buffer_pool_size
set to 5G, a quarter of the whole RAM size on that server. Most other MySQL variables don't affect performance as much as this one, so throw every byte of RAM you have there. -
borjab over 4 yearsDoes this assume autocommit to true? or at the end? P.S: in some IDEs you may need to use the delimiter keyword. dev.mysql.com/doc/connector-net/en/…
-
Circle Hsiao over 4 yearsLearn how to use code block