Why does TRANSACTION / COMMIT improve performance so much with PHP/MySQL (InnoDB)?

35,938

Solution 1

  1. Please check this link:

    https://dev.mysql.com/doc/refman/5.5/en/optimizing-innodb-transaction-management.html

    InnoDB must flush the log to disk at each transaction commit if that transaction made modifications to the database. When each change is followed by a commit (as with the default autocommit setting), the I/O throughput of the storage device puts a cap on the number of potential operations per second.

  2. Big transactions may affect performance during commit (check above)

  3. Only in case of rollback, however it may be optimized using some settings (check the link)

Solution 2

My own little test in .Net (4 fields pr. records):

INSERT 1 record, no transaction:60 ms

INSERT 1 record, using transaction:158 ms

INSERT 200 records using transactions, commit after each record:17778 ms

INSERT 200 records using no transactions:4940 ms

INSERT 200 records using transactions, only commit after last record:4552 ms

INSERT 1000 records using transactions, only commit after last record:21795 ms

Client in Denmark, server in Belgium (Google cloud f1-micro).

I meant to put this in a comment but the formatting is not good....so here is my apology in advance ;-)

Share:
35,938
jjwdesign
Author by

jjwdesign

LAMP Developer, Designer, etc.

Updated on March 31, 2020

Comments

  • jjwdesign
    jjwdesign about 4 years

    I've been working with importing large CSV files of data; usually less than 100,000 records. I'm working with PHP and MySQL (InnoDB tables). I needed to use PHP to transform some fields and do some text processing prior to the MySQL INSERTs (part of process_note_data() in code below). MySQL's LOAD DATA was not feasible, so please do not suggest it.

    I recently tried to improve the speed of this process by using MySQL transactions using START TRANSACTION and COMMIT. The performance increase was surprising. Processing time(s) dropped by a factor of 20. So, a 20 minute process only took about 1 minute.

    QUESTIONS.

    1.) Does anyone understand why there was such performance increase (20 mins to 1 min)?

    2.) Should I be concerned about how big the transaction may get with 100,000 records?

    3.) Should I be concerned with a large number of inserts and/or updates in the transaction?

    /*
     * Customer Notes Data:
     * Rows are either a meeting, call or note!
     */
    $row = 1;
    $data = array();
    $fields = array();
    $line = '';
    
    $db->query('SET autocommit=0;');
    $db->query('START TRANSACTION;');
    
    if (($handle = fopen("modules/".$currentModule."/Data/customernote.csv", "r")) !== FALSE) {
      while (($data = fgetcsv($handle, 4096, ',', '"')) !== FALSE && $row < 999000) {
        //Row 1 - CSV header row with field names
        if ($row == 1) {
          $csv_fields = $data;
        } elseif ($row > 1) {
          $fields = $this->process_note_data($data, $csv_fields, $row);
        }
        $row++;
      } // end while
      fclose($handle);
    }
    
    $db->query('COMMIT;');
    $db->query('SET autocommit=1;');
    

    Note: The text/field processing is done in the call to $this->process_note_data() which then calls another helper class that has the INSERT statement code. I didn't have enough room to include all of the code. $db->query() is a typical database object for MySQL queries.

  • jjwdesign
    jjwdesign about 11 years
    Would you recommend COMMIT'ing every 1,000 or so INSERTS's in order to reduce the processing at COMMIT or am I worrying about this a bit too much?
  • MiGro
    MiGro about 11 years
    1k should be ok but it depends on H/W. I would suggest to run some tests here. Anyway - be careful about data consistency (i.e. - you loaded 20k out of 100k records and the system crashed).
  • jeremycole
    jeremycole about 11 years
    There isn't much processing to be done at COMMIT time, so there's no reason to periodically commit for that. However if the system crashes while a very large transaction is in progress, it can require a large amount of time to rollback (and this will be done during server start while not taking requests). With just 100k rows, however you should probably not worry about that.
  • jjwdesign
    jjwdesign over 8 years
    With only 200 records, you're not going to see much of a difference. With more than 100,000+ records, you should start to see a big performance advantage.
  • J. Fdez
    J. Fdez over 5 years
    Interesting the difference between commit position using transactions... Thank you!
  • Maurice
    Maurice over 2 years
    now do the same test for SELECT