Create an index on a huge MySQL production table without table locking

99,457

Solution 1

[2017] Update: MySQL 5.6 has support for online index updates

https://dev.mysql.com/doc/refman/8.0/en/innodb-online-ddl-operations.html#online-ddl-index-syntax-notes

In MySQL 5.6 and higher, the table remains available for read and write operations while the index is being created or dropped. The CREATE INDEX or DROP INDEX statement only finishes after all transactions that are accessing the table are completed, so that the initial state of the index reflects the most recent contents of the table. Previously, modifying the table while an index is being created or dropped typically resulted in a deadlock that cancelled the INSERT, UPDATE, or DELETE statement on the table.

[2015] Updating table indicies blocks writes in MySQL 5.5

From the answer above:

"If your using a version greater than 5.1 indices are created while the database is online. So not to worry you won't interrupt production system use."

This is ****FALSE**** (at least for MyISAM / InnoDB tables, which is what 99.999% of people out there use. Clustered Edition is different.)

Doing UPDATE operations on a table will BLOCK while the index is being created. MySQL is really, really stupid about this (and a few other things).

Test Script:

(   
  for n in {1..50}; do
    #(time mysql -uroot -e 'select  * from website_development.users where id = 41225\G'>/dev/null) 2>&1 | grep real;
    (time mysql -uroot -e 'update website_development.users set bio="" where id = 41225\G'>/dev/null) 2>&1 | grep real;
  done
) | cat -n &
PID=$!
sleep 0.05
echo "Index Update - START"
mysql -uroot website_development -e 'alter table users add index ddopsonfu (last_name, email, first_name, confirmation_token, current_sign_in_ip);'
echo "Index Update - FINISH"
sleep 0.05
kill $PID
time mysql -uroot website_development -e 'drop index ddopsonfu on users;'

My Server (InnoDB):

Server version: 5.5.25a Source distribution

Output (notice how the 6th operation blocks for the ~400ms it takes to finish the index update):

 1  real    0m0.009s
 2  real    0m0.009s
 3  real    0m0.009s
 4  real    0m0.012s
 5  real    0m0.009s
Index Update - START
Index Update - FINISH
 6  real    0m0.388s
 7  real    0m0.009s
 8  real    0m0.009s
 9  real    0m0.009s
10  real    0m0.009s
11  real    0m0.009s

Vs read operations which don't block (swap the line comment in the script):

 1  real    0m0.010s
 2  real    0m0.009s
 3  real    0m0.009s
 4  real    0m0.010s
 5  real    0m0.009s
Index Update - START
 6  real    0m0.010s
 7  real    0m0.010s
 8  real    0m0.011s
 9  real    0m0.010s
...
41  real    0m0.009s
42  real    0m0.010s
43  real    0m0.009s
Index Update - FINISH
44  real    0m0.012s
45  real    0m0.009s
46  real    0m0.009s
47  real    0m0.010s
48  real    0m0.009s

Updating MySQL's Schema without downtime

Thusfar, there's only one method I know of to update a MySql schema and not suffer an availability outage. Circular masters:

  • Master A has your MySQL database running on it
  • Bring Master B into service and have it replicate writes from Master A ( B is a slave of A)
  • Perform the schema update on Master B. It will fall behind during the upgrade
  • Let Master B catch up. Invariant: Your schema change MUST be capable of processing commands replicated from a downversion schema. Indexing changes qualify. Simple column additions usually qualify. Removing a column? probably not.
  • ATOMICALLY swap all clients from Master A to Master B. If you want to be safe (trust me, you do), you should ensure that the last write to A is replicated to B BEFORE B takes its first write. If you allow concurrent writes to 2+ masters, ... you better understand MySQL replication at a DEEP level or you are headed for a world of pain. Extreme pain. Like, do you have a column that is AUTOINCREMENT??? you are screwed (unless you use even numbers on one master and odds on the other). Do NOT trust MySQL replication to "do the right thing". It is NOT smart and will not save you. It's just slightly less safe than copying binary transaction logs from the command-line and replaying them by hand. Still, disconnecting all clients from the old master and flipping them to the new master can be done in a matter of seconds, vastly faster than waiting for a multi-hour schema upgrade.
  • Now Master B is your new master. You have the new schema. Life is good. Have a beer; the worst is over.
  • Repeat the process with Master A, upgrading his schema so that he becomes your new secondary master, ready to take over in the event that your primary master (master B now) loses power or just up and dies on you.

An easy way to update schema this isn't. Workable in a serious production environment; yes, it is. Please, please, please, if there is an easier way to add an index to a MySQL table without blocking writes, let me know.

Googling lead me to this article which describes a similar technique. Even better, they advise drinking at the same point in the proceedure (Note that I wrote my answer before reading the article)!

Percona's pt-online-schema-change

The article I linked above talks about a tool, pt-online-schema-change, that works as follows:

  • Create new table with same structure as original.
  • Update schema on new table.
  • Add a trigger on the original table so that changes are kept in-sync with the copy
  • Copy rows in batches from original table.
  • Move original table out of the way and replace with new table.
  • Drop old table.

I've never tried the tool myself. YMMV

RDS

I'm currently using MySQL through Amazon's RDS. It's a really nifty service that wraps up and manages MySQL, letting you add new read replicas with a single button and transparently upgrade the database across hardware SKU's. It's really convenient. You don't get SUPER access to the database, so you can't screw with replication directly (is this a blessing or curse?). However, you can use Read Replica Promotion to make your schema changes on a read-only slave, then promote that slave to become your new master. Exactly the same trick as I described above, just vastly easier to execute. They still don't do much to help you with the cut-over. You have to reconfigure and restart your app.

Solution 2

As this blog post outlines, the InnoDB ALTER TABLE mechanism has been completely redesigned for MySQL 5.6.

(For an exclusive overview of this topic, the MySQL documentation can provide an afternoon's worth of reading.)

To add an index to a table without a lock resulting on UPDATE/ INSERT, the following statement format can be used:

ALTER TABLE my_table ADD INDEX my_table__idx (my_column), ALGORITHM=INPLACE, LOCK=NONE;

Solution 3

MySQL 5.6 update (feb 2013): You can now perform read and write operations while an index is being created even with InnoDB tables - http://dev.mysql.com/doc/refman/5.6/en/innodb-create-index-overview.html

In MySQL 5.6 and higher, the table remains available for read and write operations while the index is being created or dropped. The CREATE INDEX or DROP INDEX statement only finishes after all transactions that are accessing the table are completed, so that the initial state of the index reflects the most recent contents of the table. Previously, modifying the table while an index is being created or dropped typically resulted in a deadlock that cancelled the INSERT, UPDATE, or DELETE statement on the table.

and:

In MySQL 5.6, this feature becomes more general: you can read and write to tables while an index is being created, and many more kinds of ALTER TABLE operations can be performed without copying the table, without blocking DML operations, or both. Thus in MySQL 5.6 and higher, we typically refer to this set of features as online DDL rather than Fast Index Creation.

from http://dev.mysql.com/doc/refman/5.6/en/glossary.html#glos_fast_index_creation

Solution 4

pt-online-schema-change is the way to go if you really want to make sure that the migration will not bring down the site.

As I wrote in the above comment, I have several experiences with pt-online-schema-change in production. We have our main table of 20M+ records and a master -> 2 read-only replication slaves. I've done at least a dozens of migrations with pt-online-schema-change from adding a new column, changing charset, to adding several indices. We serve tons of traffic during the migration time as well and we have not had any hiccup. Of course you'd have to test all the scripts very thoroughly before running on production.

I tried to batch up the changes into 1 script so that pt-online-schema-change only have to copy the data once. And be very careful with changing column name since you will loose your data. However, adding an index should be fine.

Share:
99,457
n0cturnal
Author by

n0cturnal

Updated on April 20, 2020

Comments

  • n0cturnal
    n0cturnal about 4 years

    I need to create an index on a ~5M rows MySQL table. It is a production table, and I fear a complete block of everything if I run a CREATE INDEX statement...

    Is there a way to create that index without blocking inserts and selects?

    Just wondering I have not to stop, create index and restart my system!

  • Alex Le
    Alex Le over 8 years
    pt-online-schema-change works great even in a master-slave replication. I've used it to do live migration on a busy read 20M+ records table on our production master db with 2 replication slaves without any hiccup or downtime. It takes some time to prepare the script, and I usually having to create a .sql file containing the raw SQL change and a .sh file as wrapper to run the same SQL but in fragment format (no ALTER TABLE). You can run multiple commands with pt-online-schema-change by stringing them up and separated by comma.
  • Nikhil Sahu
    Nikhil Sahu almost 7 years
    Then how can Dave's analysis be explained?
  • Mark Amery
    Mark Amery over 6 years
    -1; I don't know about older versions, but I know that index creation doesn't block concurrent DML in MySQL 5.6+ (for which an RC existed at the time this answer was written, and which had been officially released when this answer was lasted edited in May 2013) because I have relied upon this to run multi-hour index creations on production tables while still accepting inserts. And while you may be right about index creation blocking DML in 5.5 and below, the sub-second delay demonstrated here isn't entirely convincing.
  • Mark Amery
    Mark Amery over 6 years
    @NikhilSahu Dave clearly wasn't testing on MySQL 5.6, but on some older version. Note that 5.6 was not yet released at the point that Dave posted the initial revision of his answer.
  • Mark Amery
    Mark Amery over 6 years
    I disagree with your unqualified recommendation of pt-online-schema-change. It's great, but is overkill for many situations where MySQL 5.6+'s online DDL capabilities already work fine. It also has limitations (like not playing nicely with triggers), and doubles the amount of writing needed per insert to the original table while a schema change is in progress. It will tax your disk significantly more than an ordinary online schema change would, and so has the potential to "bring down your site" in circumstances where just running the schema change the simple way would've worked fine.
  • Dave Dopson
    Dave Dopson over 6 years
    +1. My analysis was on MySQL 5.5 (the latest that was available in 2013). I'm updating my answer to reflect the new capabilities in MySQL 5.6.
  • Dave Dopson
    Dave Dopson over 6 years
    @MarkAmery - blocking behavior is blocking behavior, and 400ms is an eternity. MySQL 5.5 blocks for index updates. Build a bigger test database, and it will block for seconds, hours, or days. I wrote this post before MySQL 5.6 had online schema updates, so my original content doesn't reflect that fact. I've updated the post to reflect the newly available information.
  • Alex Le
    Alex Le over 6 years
    I wrote based on my actual experience with pt-online-schema-change at the time so I'm not sure why you'd call my recommendation "unqualified." We were having at least 1000+ visitors on the site at any given moment when I ran the schema changes, and of course, disk IO was taxing, but our site did not go down. Having good caching helped as well. I have not used MySQL 5.6+ online DDL but from my experience, pt-online-schema-change did its job well in our case.
  • Mark Amery
    Mark Amery over 6 years
    @AlexYe Yikes, I meant "unqualified" in the sense of "without reservation" rather than the sense of "delivered by someone who's not qualified to comment" - the latter interpretation didn't occur to me until I saw your comment and certainly isn't what I intended! i.e. I was saying that while pt-online-schema-change is a useful tool, there are very many situations in which ordinary online DDL is just as good and a handful where it's better, so any recommendation of it ought to be carefully caveated rather than universal.
  • Alexander Torstling
    Alexander Torstling over 5 years
  • toto_tico
    toto_tico almost 4 years
    @DaveDopson, are you 100% sure that only UPDATE operations are blocked?
  • Dave Dopson
    Dave Dopson almost 4 years
    That was the case for the version I tested.
  • alexkb
    alexkb almost 3 years
    Another caveat: when running this command, if you get the error "Cannot change column type INPLACE. Try ALGORITHM=COPY" it is because your original table was created prior to MySQL 5.6 and doesn't support it yet: dev.mysql.com/doc/refman/5.6/en/…