How to deal with enormous line lengths created by mysqldump

31,348

Solution 1

By default, mysqldump generates only one INSERT command per table, resulting in one (very long) line of inserted data for each table that got dumped. This is essentially because the "batch" inserts are much faster than if it generated a separate INSERT query for every record in every table.

So, it's not that mysqldump has created arbitrarily long lines, and you can just impose some other cutoff length. The lines are long for a reason.

If it's really important to get the INSERTs broken down onto multiple lines, you can indicate that with:

mysqldump --extended-insert=FALSE --complete-insert=TRUE ...

Note, however, that restoring tables will take longer in this format.

Solution 2

I was browsing the MySQL source code looking for a solution to this problem today. The maximum line length is enforced by the variable opt_net_buffer_length which is supposed to match the MySQL server's buffer size. It is comically large.

But anyway, it's an option, so just do this:

mysqldump --net_buffer_length=5000 ...

The minimum value is 4096.

Solution 3

I came across an answer on the MySQL forums, which conclusively shows adding '\n' after each INSERT group is not possible using mysqldump alone, without modifying the source:

The extended format cannot be 100% properly parsed based on the comma or parenthesis, you would to count fields. The best solution, fix mysqldump to linebreak on output.

Very minor change: On line 3506, you can see where the row-ending comma is output:
fputc(',',md_result_file); /* Always row break */

Simply insert this line immediately after line 3506:
fputc('\n',md_result_file); /* Lon Binder says wrap that line! */

re-compile and done.

@see http://forums.mysql.com/read.php?28,420002,426110#msg-426110

Thanks Lon B!

(I've included the content from the MySQL forum just in case the forum disappears.)

Solution 4

This flag is also worked:

mysqldump --skip-extended-insert 

Just same as --extended-insert=FALSE.

Solution 5

Using a regex to split lines is not enough, you need a parser that will properly understand quotes and escaping characters.

I just wrote a parser since I couldn’t find one: http://blog.lavoie.sl/2014/06/split-mysqldump-extended-inserts.html

Share:
31,348
pavium
Author by

pavium

I began programming as an Electrical Engineer in the microelectronics industry more than 30 years ago. I have always used very specialised automatic test equipment, which gives me an unusual perspective on how to solve software problems (usually by myself without recourse to libraries and frameworks). This is definitely not mainstream, but I don't create commercial software: my user-base is my fellow employees. My contributions to SO probably seem other-worldly, but the programming environments I've used have never been Windows-based and only vaguely Unix-like. HOWEVER — In late February 2010, I was admitted to hospital with a suspected stroke. Tests showed it wasn't a stroke, it is Multiple Sclerosis. Thus begins an inevitable slide into disability. While I'm now back at work, hopefully for a few more years, I think the effort of engaging in SO (after a lull of over 12 months) might be too much for me. Besides, I can't type as fast as I used to. My neurologist made the disturbing comment that I may have lost some mental faculties and be unaware of it, so my presence on SE websites will serve to test whether my brain still works.

Updated on July 05, 2022

Comments

  • pavium
    pavium almost 2 years

    I'm using mysqldump in a cron job to backup a database with over 2 million rows.

    It creates a text file which can be used to restore the datalog from the command line.

    I thought it would be useful to edit the dump before a restore as a quick way of changing values and table or column names - at least until I learn more and become confident about doing it with ALTER and UPDATE.

    Editing large text files does not bother me, but I was surprised to find that in a 250 megabyte dump of my database, there were only about 300 lines. Each line was something like 800k characters long.

    Is there another way of generating dumps with more control over line length?

    Or should I post-process the dump with tools like sed or Perl?

  • pavium
    pavium over 14 years
    I had looked in the mysqldump man page, and tried --extended-insert=FALSE but --complete-insert=TRUE is new to me. I'll try it, thanks.
  • pavium
    pavium over 14 years
    I prefer Perl or sed, simply because that's what I'm used to. And although there's a certain satisfaction in editing a big file, I will learn the SQL commands, I promise.
  • Alex Weinstein
    Alex Weinstein over 13 years
    This really helped me, thank you. The extra long lines produced by MySQLDump were actually breaking and were not working right during the restore... switching extended inserts off fixed the issue. And since I zip the resulting archive, the backup file isn't that big.
  • Dave Aiello
    Dave Aiello over 11 years
    We do a lot of legacy Content Management System migration and the --extended-insert and --complete-insert recommendations are quite valuable. Thank you.
  • Vladimir Panteleev
    Vladimir Panteleev over 11 years
    Will this cause problems with individual rows containing more than 4K data?
  • Buttle Butkus
    Buttle Butkus over 11 years
    @CyberShadow good question. I bet it would but I have no idea. Next time I do this, I'll try matching up the mysqldump net_buffer_length to the max_allowed_packet and see if that works. I just did an import where 16M max_allowed_packet was exceeded by query size. Even 64M was too small - went to 256M and got it through. So I would probably try a net_buffer_length of around 10M next time, myself, and see what that does.
  • Buttle Butkus
    Buttle Butkus over 11 years
    I'm wondering if anyone can confirm that the net_buffer_length option will also work well. It seems like a better solution to me.
  • Ken Williams
    Ken Williams over 11 years
    Post-processing is a bad idea unless you're really certain you understand exactly how the data is escaped, and exactly how to write code to deal with it. Otherwise you're going to corrupt your data.
  • Ken Williams
    Ken Williams over 11 years
    @VoteyDisciple - that explains why it's one big INSERT, but not why it's one big line. There could easily be a newline after each insertion row, right?
  • VoteyDisciple
    VoteyDisciple over 11 years
    That's true: mysqldump could hypothetically add an option to write a single INSERT statement but with each of the VALUES lists separated by newlines. That wouldn't be a bad option, but neither would it probably be in high demand. When your goal is just to produce a backup, extra newlines are just unnecessary bytes. And when your goal is to manually manipulate the file, fully explicit statements (while verbose) are nice. I can't speak to why mysqldump's authors did not include a particular option, but personally I don't don't find it lacking.
  • Tomáš Fejfar
    Tomáš Fejfar about 11 years
    Oh, there sure IS a demand for that! We use mysqldump to store DB for each "tag". We make some smaller updates between tags and with one big line you can't see which content has changed and you need to scroll sideways to it.
  • VoteyDisciple
    VoteyDisciple about 11 years
    Using the extended INSERT format will resolve that. You'll get separate INSERT statements on each line, so the only sideways scrolling you'd do is within a particular statement.
  • hfrmobile
    hfrmobile almost 11 years
    In my eyes it is just a "formatting issue". Why not generate ONE INSERT statement but the value blocks for each row in separate lines. This would make the generated file human readable and does not effect performance at all.
  • hfrmobile
    hfrmobile almost 11 years
  • SineSwiper
    SineSwiper over 10 years
    Ugh, why is this not an option on mysqldump? This is the best of both worlds.
  • Tamlyn
    Tamlyn about 10 years
    If you're sure you won't corrupt the data, here's an example command to split the inserts mysqldump -udbuser dbname | sed 's/),(/),\n(/g' > dump.sql
  • Tamlyn
    Tamlyn about 10 years
    PHPMyAdmin generates its dump files with a new line between each insertion. I guess there was demand for that.
  • Pieter le Roux
    Pieter le Roux almost 10 years
    @Tamlyn But what if I have ),( in a VARCHAR field? Maybe a comma delimited list of emoticons, or something. I guess, yeah... if you know your data well enough that there's no possibility that you have anything like that...
  • LeonardChallis
    LeonardChallis over 9 years
    I'm sure in your text editor you could use a simple (regex) search and replace to format it how you wish.
  • Maxim Kholyavkin
    Maxim Kholyavkin over 9 years
    Please give link to source of mysqldump.
  • StampyCode
    StampyCode over 9 years
    @Speakus - MySQL is owned by Oracle and offers no direct link to download or view individual source files - you can download the source bundle from here: dev.mysql.com/downloads/utilities - just select 'Source' as the platform.
  • Maxim Kholyavkin
    Maxim Kholyavkin over 9 years
    thank you. minor improve for your comment: i should choose source not from your link, but from here: dev.mysql.com/downloads/mysql + open file mysql-5.6.20/client/mysqldump.c + search required line via comment (i have found line 3791) + add new line as described. another way to get source code: mariadb.com/kb/en/mariadb/documentation/getting-started/… + change maria/client/mysqldump.c (line 3928). Unfortunately, both tools (oracle mysql and mariadb) don't have option to solve question out of box still
  • colin lamarre
    colin lamarre over 9 years
    by far the best answer, won't be too slow and won't break other DBMS like sqlite which has a default limit of 500 inserts per insert statement. thank you!
  • Michael Platings
    Michael Platings over 8 years
    @CyberShadow no it doesn't cause problems with bigger rows - they take as much space as they need. Tested with MariaDB 10.0.21.
  • Daniel Beardsmore
    Daniel Beardsmore almost 7 years
    Yeah I like that one better for some reason :) Just needed this for a diff, where the normal long lines make diffs really painful.
  • Idris
    Idris almost 7 years
    Performance wise, it took to import my 20MB / 335k row database dump: [ with a single-transaction dump: 22s ] // [ with a 4096 buffer length dump: 1m 44s ] // [ with extended-insert=false: ???? I got bored waiting and CTRL-Ced it after 33 minutes ]. So this answer is without a doubt the best compromise between speed and diffability. Caveat emptor: My subsequent dump was identical but I didn't test with row data longer than 4096 chars.
  • Idris
    Idris almost 7 years
    *Actually, thinking about it, if a row early on in a table is modified or deleted, then all subsequent lines would have their content offset across the block boundaries, making subsequent dumps differ on every line after that point. So maybe this doesn't have great diffability after all...
  • mwfearnley
    mwfearnley almost 5 years
    Every record starts with INSERT INTO table_name VALUES (...);, which takes up more space, but can lend some helpful context if it's not obvious from the data which table it's going into.
  • mwfearnley
    mwfearnley almost 5 years
    It might be better to post a Python solution or a link to one, rather than just the Python homepage. Someone who hasn't heard of Python before will have a long journey ahead before they can write a suitable post-processing script.
  • Chris Browet
    Chris Browet over 3 years
    Very useful for existing dumps :)
  • user2066657
    user2066657 about 2 years
    Someone who hasn't heard of Python shouldn't be copy-pasting from the Internet without the skills to review the code! :-P