Slow MySQL inserts

17,715

Solution 1

It could be a time for horizontal partitioning and moving blob field into a separate table. In this article in 'A Quick Side Note on Vertical Partitioning' author removes a larger varchar field from a table and it increases speed of a query about order of magnitude.

The reason is physical traversal of the data on a disk becomes significantly faster if there is less space to cover, so moving bigger fields elsewhere increases performance.

Also (and you probably do it already) it is beneficial to decrease the size of your index column to its absolute minumum (char(32) in ascii encoding for md5), because size of the key is directly proportional to the speed of its use.

If you do multiple inserts at a time with InnoDB tables you can significantly increase speed of inserts by wrapping them into transaction and doing mupliple inserts in one query:

START TRANSACTION
INSERT INTO x (id, md5, field1, field2) values (1, '123dab...', 'data1','data2'),(2,'ab2...','data3','data4'),.....;
COMMIT

Solution 2

See Speed of INSERT Statements. Do you have frequent MD5 collisions? I believe these should not happen too many times, so maybe you can use something like INSERT ... ON DUPLICATE to handle the collisions. If you have specific insert periods, you can disable keys for the time of the insert and restore them later. Another option is to use replication, using a master machine for the inserts and a slave for the selects.

Solution 3

Are you using MyISAM?
AFAIK MyISAM has a very good read-performance, but bad write performance.

InnoDB should be balanced in speed.

Solution 4

I asked a somewhat-related question today as well.

One of the answers provided is to consider the INSERT DELAYED so that it goes into the insert queue, and is handled when the db is not as busy.

Solution 5

Does your data fit in RAM? If not, get more RAM until that becomes uneconomic (16G is usually about the point for most people).

Then, do your indexes fit in the MyISAM key buffer?

If you're running a 32-bit OS, don't. Once you're on a 64-bit OS, set the key buffer to be approx 1/3 of the ram. RAM is used by the OS's cache to cache data files (which does little for inserts but is beneficial for selects).

Having multi-gigabyte tables in MyISAM can be a pain because in the event of an unclean shutdown, very lengthy repair operation(s) are required, but

Don't switch MySQL engines without significant validation of your application, it will change the behaviour in many ways (not just performance). It will affect disc space usage.

Share:
17,715
jbatista
Author by

jbatista

Updated on June 04, 2022

Comments

  • jbatista
    jbatista almost 2 years

    I am using and working on software which uses MySQL as a backend engine (it can use others such as PostgreSQL or Oracle or SQLite, but this is the main application we are using). The software was design in such way that the binary data we want to access is kept as BLOBs in individual columns (each table has one BLOB column, other columns have integers/floats to characterize the BLOB, and one string column with the BLOB's MD5 hash). The tables have typically 2, 3 or 4 indexes, one of which is always the MD5 column, which is made UNIQUE. Some tables already have millions of entries, and they have entered the multi-gigabyte in size. We keep separate per-year MySQL databases in the same server (so far). The hardware is quite reasonable (I think) for general applications (a Dell PowerEdge 2U-form server).

    MySQL SELECT queries are relatively fast. There's little complaint there, since these are (most of the time) in batch mode. However, INSERT queries take a long time, which increases with table size (number of rows). Admittedly, this is because the MD5 column is of type UNIQUE and so each INSERT has to figure out whether each new row has a corresponding, already-inserted, MD5 string. And it's not too strange (I think) if the performance gets worse if there are other indexes (not unique). But I still can't put my mind to rest that this software architecture choice (I suspect keeping BLOBs in the table row instead of disk has a significant, negative impact) is not the best choice. Insertions are not critical, but it is an annoying feeling to have.

    Does anyone have experience in similar situations? With MySQL, or even other (preferably Linux-based) RDBMes? Any insights you would care to provide, maybe some performance figures?

    BTW, the working language is C++ (which wraps C calls to MySQL's API).

  • jbatista
    jbatista over 14 years
    Yes, I forgot that detail. The server we're using uses MyISAM.
  • jbatista
    jbatista over 14 years
    Thanks for the suggestions, I'll look into them.
  • jbatista
    jbatista over 14 years
    Based on a reply in this thread, I've looked into MySQL's INSERT DELAYED documentation. They claim that care should be used when opting for INSERT DELAYED, because it can deteriorate overall performance in single-client insertions compared with "normal" INSERTs.
  • warren
    warren over 14 years
    ok - I did not see that myself, though I was also looking to be able to speed-up the script that's performing the inserts rather than to the performance of the insert itself
  • user2308001
    user2308001 over 14 years
    MhISAM is not transactional, that means you can safely ignore everything I wrote about transactions, however the insert statement with several hundreds or thousands inserts in one query will speed things up in MyISAM as well.
  • jbatista
    jbatista over 14 years
    OK. Do you know if the INSERT INTO syntax imposes a limit to the length of the query string? I.o.w., does MySQL accept an INSERT INTO query string with a length of (for example) hundreds of millions of characters? It smells like trouble to me.
  • user2308001
    user2308001 over 14 years
    The limit is determined by max_allowed_packet in configuration file. I keep mine 16M. I usually add about 10000 records in one insert statement
  • user2308001
    user2308001 over 14 years
    it is worth noting that INSERT DELAYED works with MyASAM, but not InnoDB
  • Tim Duncklee
    Tim Duncklee over 9 years
    From the link above: "If you are inserting many rows from the same client at the same time, use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements." This took my insert process of 6K records from 2.5 min to 5 seconds