Limiting the number of records from mysqldump?
Solution 1
As skaffman says, use the --where option:
mysqldump --opt --where="1 limit 1000000" database
Of course, that would give you the first million rows from every table.
Solution 2
If you want to get n
records from a specific table you can do something like this:
mysqldump --opt --where="1 limit 1000000" database table > dump.sql
This will dump the first 1000000
rows from the table named table
into the file dump.sql
.
Solution 3
As the default order is ASC which is rarely what you want in this situation, you need to have a proper database design to make DESC work out of the box. If all your tables have ONE primary key column with the same name (natural or surrogate) you can easily dump the n latest records using:
mysqldump --opt --where="1 ORDER BY id DESC limit 1000000" --all-databases > dump.sql
This is a perfect reason to why you should always name your PK's id and avoid composite PK's, even in association tables (use surrogate keys instead).
Solution 4
mysqldump can be given a SQL query to execute, from which it will take the data for the dump. You can then use the "limit X" clause in your query to restrict the number of rows.
Related videos on Youtube
Admin
Updated on September 08, 2020Comments
-
Admin over 3 years
I am trying to load a small sample of records from a large database into a test database.
How do you tell mysqldump to only give you n records out of 8 million?
Thanks
-
Phob almost 13 yearsWhat does the "1" before limit do?
-
Adam Bellaire almost 13 years@Phob: The --where option is basically appended to a query of the form
SELECT * from table WHERE
, so in this case you getSELECT * from table WHERE 1 limit 1000000
. Without the 1, you would have an invalid query. Specifying 1 for a where clause (since 1 is always true) simply selects all records. -
Phob almost 13 yearsWow, what a hack. So you can basically SQL inject yourself this way.
-
keithxm23 over 11 yearsDoes this maintain all foreign key integrities? If not, is there a way to do that?
-
Mohamed Hafez over 7 yearsIs there a way to get the last 1,000,000 rows, i.e. the most recently added ones?
-
pfuri about 7 yearsThanks! Additionally, you can use:
mysqldump --opt --where="1 limit 1000000 offset 1000000" --no-create-info database
to get the second page of 1 million records. Make sure to use the --no-create-info flag on pages other than the first to only dump the data and leave off the create table stuff. -
someone over 6 yearsDo this (name id and avoid composite PK's) and you'll need to ignore relational database theory.
-
someone over 6 yearsActually, if you design your database following the best practices of relational database, defining your PK's based on data and entity, you can use --option --where="1 LIMIT 10000" for example. Without ORDER BY, this will work because MySQL will order in natural manner, what is equivalent to say that it will follow the PK's index order. Then, all FK of related tables will have only data that exists in their reference's table because the order will be the same.
-
someone over 6 yearsThe use of ID's is a true plague of many developers. Having ID's like PK's is the same of doesn't having PK's. Your integrity was go to hole because, in most of the cases, an auto increment number doesn't have nothing to do with the entity data.
-
Andreas Bergström over 6 years@mpoletto --where="1 LIMIT 10000" will only pick the 10000 first entries. The whole point of my answer was to show how you would solve getting the latest X entries, which is usually what you want. I also do not understand what naming conventions has to do with "ignoring relational database theory", I think you missunderstood my answer. Most popular ORMs like EF, Django ORM, etc. default to and advise "id" for PK-columns, since it is redundant to say users.user_id instead of just users.id.
-
someone over 6 yearswhen you say that there is a "perfect reason to why you should always name you PK's id and avoid composite PK's" you are ignoring relational database theory. Your argument about "most popular ORMs" isn't valid because this ORMs need tables with IDs to work.
-
Andreas Bergström over 6 years@mpoletto And how am I ignoring RBDMS theory by saying that PKs should be called simply id instead of i.e. user_id?
-
someone over 6 yearsWhen you say to avoid composite keys. You don't avoid keys when you design a relational model, you define keys because you need, don't matter if they are composite keys or not. You don't design a model based on what an ORMs needs. Not all models fit to use surrogate keys. But, unfortunately, this is very common practice between programmers.
-
apostl3pol over 6 yearsLooks like
--opt
isn't necessary. From the manpages: "Because the --opt option is enabled by default, you only specify its converse, the --skip-opt to turn off several default settings." -
MAx Shvedov about 4 yearsThanks you very much, this is really what I searching for
-
Robert Mikes over 3 yearsReally? How can you give mysqldump a query? I can't find it in the documentation.
-
U47 about 3 yearsYou can also
ORDER BY 1 DESC
if your ID columns aren't named consistently but are the first logical column defined in the table. -
awm over 2 yearsis there a way through mysql internals to get the most recent N updated rows?
-
Andreas Bergström over 2 years@awm set an updated_at column and sort on it instead