IO Wait causing so much slowdown (EXT4 JDB2 at 99% IO ) During Mysql Commit

linux mysql performance python

54,256

Solution 1

Put the database on a non-journaling file system. At least larger servers (oracle, sql server) have their own journal function (transaction log) and optimize their IO accordingly. You have log and database on separate file systems and discs and rely on database internal functionality for handling bad IO. There are normally no (larger setup) file system changes except write date anyway because files do not expand - they would be generated with their "final" size (ok, admins can change that), and changes are as I said tracked by the database level transaction log.

You may also want to tell us what your hardware layer is. Most people underestimate that IOPS is the limiting factor for a database and think a small disc set is a proper environment for a large database. While some of us work on databases using a larger number of discs, thus potentially supporting a higher number of IOPS.

Solution 2

There is always going to be a trade off between resiliency and performance.

With MySQL on ext4 the barriers=1 default does indeed cause a slow down, however the first action should not be to disable journaling or to turn on data=writeback.

First, if resiliency is of high importance, a battery backed RAID is certainly well worth it.

The mount options I have chosen, especially on non-battery backed RAID are:

/dev/mapper/vg-mysql--data  /var/lib/mysql/data ext4  defaults,noatime,nodiratime,barrier=1,data=ordered  0 0

This is intentionally not using data=writeback because I do not want to risk filesystem corruption resulting in "old data to appear in files after a crash and journal recovery" (quote is from man mount).

The ideal configuration in my.cnf for full resiliency around I/O related settings are:

[mysqld]
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1

I have opted for the following sequence of trade-offs to increase performance:

sync_binlog = 0: this is the first MySQL config that I change away from the full resiliency. The reason for this is that it gives a significant performance improvement, especially where binlog_format=row (unfortunately required for Jira). I am using enough MySQL replicas in the cluster that if the binlog were to become corrupted by a power loss scenario I would do a binary copy from another replica.
innodb_flush_log_at_trx_commit = 2: While a value of 1 is required for full ACID compliance, with a value of 2 "the log buffer is written out to the file at each commit, but the flush to disk operation is not performed on it. However, the flushing on the log file takes place once per second also when the value is 2. Note that the once-per-second flushing is not 100% guaranteed to happen every second, due to process scheduling issues." (quote from MySQL docs)
Update the mount options to use data=writeback. Note that if this is your root file system you will also need to pass a kernel command line option. I put together a few steps on that at coderwall.
Test various values of innodb_flush_method. O_DIRECT is shown to improve performance in some workloads, but it's not a given that this will work in your environment.
Upgrade to SSDs, in which case you'll also want to increase innodb_io_capacity, and tune settings such as innodb_adaptive_flushing, innodb_read_io_threads, innodb_write_io_threads, innodb_purge_threads, and other possible settings.

Solution 3

It is quite likely that your I/O backend is not coping with the load all that well. You should make sure your filesystem is not journaling data. I would suggest using the data=writeback,relatime,nobarrier parameters to mount for your database's data partition as the first quick&dirty optimization.

Also, deducing from your symptoms, you are apparently not using write caching with your controller. You should make sure you are using a battery-backed or flash-backed write cache on your controller and enable it - this should give you a significant performance boost without vastly increasing the risk of data loss or corruption. Note that using write cache without a battery or flash backup does increase the risk of data loss or corruption significantly - so only do this for testing purposes and/or if you can take the loss.

Solution 4

This is an old question, but we faced the same issues (High IO waits, and terrible insert / update speeds) the past week on a new dedicated server and this solution addresses this issue directly.

Disabling journaling with tune2fs -O "^has_journal" /dev/<drive> was the quickest solution as it eliminates the IO wait because of the JDB2 process. But this isn't recommended unless you have a battery backed drive because you'll lose data in the event of a crash. InnoDB tables are safe if you have doublewrite enabled in MySQL. But files like .frm, logs, etc aren't safe. We tried moving these files to another drive (especially the bin logs) but the jdb2 IO wait still persisted. So it didn't leave us very comfortable.

data=writeback,relatime,nobarrier didn't help it speed up writes / reads as much as disabling journaling on the whole partition. More options for ext4 are in the EXT4 doc.

The real culprit in our case was sync_binlog. We had set is as 1 in /etc/mysql/my.cnf and it was killing performance.

Percona validates this here. We set it to it's default of 0 and performance shot up by over a 500%.

View more solutions

54,256

Phyo Arkar Lwin

Will Add Later!

Updated on September 18, 2022

Comments

Phyo Arkar Lwin over 1 year

I am writing an indexer, using python, which indexes documents and insert them into Database, Before it was single process but now i made it to multiprocessing with 4 parallel processes running.After every text extraction , it insert into database and does a commit.

Now it hitting IO problem , the main IO Problem is not my process but EXT4's jdb2 , journeling system. It is at 99.99% and casuing CPU to wait for IO at every MySQL Commit.

I saw many having that problem on the internet and their solution is to mount using barrier = 0 . Would that disable Journaling totally ? My Servers have UPS and tempting to do it , should i ?
- RolandoMySQLDBA almost 11 years
  
  Is all your data InnoDB ???
the-wabbit about 12 years

Since he is committing transactions, the engine would not be MyISAM since MyISAM does not support transactions.
the-wabbit about 12 years

I would modify this to using a filesystem not using the journal for data but only metadata. Ext4 can be configured this way as well.
TomTom about 12 years

Yes. At the end the jouirnal doubles the IO - and the database log will do the same again, so you wend up with a lot more IOPS than you have to. And redundancy that basically is not needed. The system jouirnalling is NICE to protect the file.... but useless when the application does so already, which databases do.
adaptr about 12 years

Arr, brainfart.
Phyo Arkar Lwin about 12 years

I am using innodb , mysql5.5 defaults to innodb.
Phyo Arkar Lwin about 12 years

Which offers best performance at non-journaling? Thanks!
Phyo Arkar Lwin about 12 years

so how about : data=writeback,relatime,nobarrier and then totally disable mysql Logging? I think this would speed things up a lot?
Phyo Arkar Lwin about 12 years

hdpram -i shows that i am using write caching. so hmm ??
the-wabbit about 12 years

@V3ss0n you can't disable logging for a transactional engine - it is the very heart of it. You might choose to move the transaction log to a different set of disks as it has a totally different access pattern (mostly linear writes) than your main database data (random read/writes) - this is a commonly recommended configuration. As for your storage setup: you are not using a RAID controller but simply individual disks with write cache on? This would not help any of your synchronous writes as they come with explicit cache flush requests.
Nic Cottrell about 10 years

Is nobarrier the same as barrier=0 ?
kouton over 9 years

@NicCottrell yes, they're the same.