Limiting the number of records from mysqldump?

71,837

Solution 1

As skaffman says, use the --where option:

mysqldump --opt --where="1 limit 1000000" database

Of course, that would give you the first million rows from every table.

Solution 2

If you want to get n records from a specific table you can do something like this:

mysqldump --opt --where="1 limit 1000000" database table > dump.sql

This will dump the first 1000000 rows from the table named table into the file dump.sql.

Solution 3

As the default order is ASC which is rarely what you want in this situation, you need to have a proper database design to make DESC work out of the box. If all your tables have ONE primary key column with the same name (natural or surrogate) you can easily dump the n latest records using:

mysqldump --opt --where="1 ORDER BY id DESC limit 1000000" --all-databases > dump.sql

This is a perfect reason to why you should always name your PK's id and avoid composite PK's, even in association tables (use surrogate keys instead).

Solution 4

mysqldump can be given a SQL query to execute, from which it will take the data for the dump. You can then use the "limit X" clause in your query to restrict the number of rows.

Share:
71,837

Related videos on Youtube

Admin
Author by

Admin

Updated on September 08, 2020

Comments

  • Admin
    Admin over 3 years

    I am trying to load a small sample of records from a large database into a test database.

    How do you tell mysqldump to only give you n records out of 8 million?

    Thanks

  • Phob
    Phob almost 13 years
    What does the "1" before limit do?
  • Adam Bellaire
    Adam Bellaire almost 13 years
    @Phob: The --where option is basically appended to a query of the form SELECT * from table WHERE , so in this case you get SELECT * from table WHERE 1 limit 1000000. Without the 1, you would have an invalid query. Specifying 1 for a where clause (since 1 is always true) simply selects all records.
  • Phob
    Phob almost 13 years
    Wow, what a hack. So you can basically SQL inject yourself this way.
  • keithxm23
    keithxm23 over 11 years
    Does this maintain all foreign key integrities? If not, is there a way to do that?
  • Mohamed Hafez
    Mohamed Hafez over 7 years
    Is there a way to get the last 1,000,000 rows, i.e. the most recently added ones?
  • pfuri
    pfuri about 7 years
    Thanks! Additionally, you can use: mysqldump --opt --where="1 limit 1000000 offset 1000000" --no-create-info database to get the second page of 1 million records. Make sure to use the --no-create-info flag on pages other than the first to only dump the data and leave off the create table stuff.
  • someone
    someone over 6 years
    Do this (name id and avoid composite PK's) and you'll need to ignore relational database theory.
  • someone
    someone over 6 years
    Actually, if you design your database following the best practices of relational database, defining your PK's based on data and entity, you can use --option --where="1 LIMIT 10000" for example. Without ORDER BY, this will work because MySQL will order in natural manner, what is equivalent to say that it will follow the PK's index order. Then, all FK of related tables will have only data that exists in their reference's table because the order will be the same.
  • someone
    someone over 6 years
    The use of ID's is a true plague of many developers. Having ID's like PK's is the same of doesn't having PK's. Your integrity was go to hole because, in most of the cases, an auto increment number doesn't have nothing to do with the entity data.
  • Andreas Bergström
    Andreas Bergström over 6 years
    @mpoletto --where="1 LIMIT 10000" will only pick the 10000 first entries. The whole point of my answer was to show how you would solve getting the latest X entries, which is usually what you want. I also do not understand what naming conventions has to do with "ignoring relational database theory", I think you missunderstood my answer. Most popular ORMs like EF, Django ORM, etc. default to and advise "id" for PK-columns, since it is redundant to say users.user_id instead of just users.id.
  • someone
    someone over 6 years
    when you say that there is a "perfect reason to why you should always name you PK's id and avoid composite PK's" you are ignoring relational database theory. Your argument about "most popular ORMs" isn't valid because this ORMs need tables with IDs to work.
  • Andreas Bergström
    Andreas Bergström over 6 years
    @mpoletto And how am I ignoring RBDMS theory by saying that PKs should be called simply id instead of i.e. user_id?
  • someone
    someone over 6 years
    When you say to avoid composite keys. You don't avoid keys when you design a relational model, you define keys because you need, don't matter if they are composite keys or not. You don't design a model based on what an ORMs needs. Not all models fit to use surrogate keys. But, unfortunately, this is very common practice between programmers.
  • apostl3pol
    apostl3pol over 6 years
    Looks like --opt isn't necessary. From the manpages: "Because the --opt option is enabled by default, you only specify its converse, the --skip-opt to turn off several default settings."
  • MAx Shvedov
    MAx Shvedov about 4 years
    Thanks you very much, this is really what I searching for
  • Robert Mikes
    Robert Mikes over 3 years
    Really? How can you give mysqldump a query? I can't find it in the documentation.
  • U47
    U47 about 3 years
    You can also ORDER BY 1 DESC if your ID columns aren't named consistently but are the first logical column defined in the table.
  • awm
    awm over 2 years
    is there a way through mysql internals to get the most recent N updated rows?
  • Andreas Bergström
    Andreas Bergström over 2 years
    @awm set an updated_at column and sort on it instead