SQL query: Delete all records from the table except latest N?
Solution 1
You cannot delete the records that way, the main issue being that you cannot use a subquery to specify the value of a LIMIT clause.
This works (tested in MySQL 5.0.67):
DELETE FROM `table`
WHERE id NOT IN (
SELECT id
FROM (
SELECT id
FROM `table`
ORDER BY id DESC
LIMIT 42 -- keep this many records
) foo
);
The intermediate subquery is required. Without it we'd run into two errors:
- SQL Error (1093): You can't specify target table 'table' for update in FROM clause - MySQL doesn't allow you to refer to the table you are deleting from within a direct subquery.
- SQL Error (1235): This version of MySQL doesn't yet support 'LIMIT & IN/ALL/ANY/SOME subquery' - You can't use the LIMIT clause within a direct subquery of a NOT IN operator.
Fortunately, using an intermediate subquery allows us to bypass both of these limitations.
Nicole has pointed out this query can be optimised significantly for certain use cases (such as this one). I recommend reading that answer as well to see if it fits yours.
Solution 2
I know I'm resurrecting quite an old question, but I recently ran into this issue, but needed something that scales to large numbers well. There wasn't any existing performance data, and since this question has had quite a bit of attention, I thought I'd post what I found.
The solutions that actually worked were the Alex Barrett's double sub-query/NOT IN
method (similar to Bill Karwin's), and Quassnoi's LEFT JOIN
method.
Unfortunately both of the above methods create very large intermediate temporary tables and performance degrades quickly as the number of records not being deleted gets large.
What I settled on utilizes Alex Barrett's double sub-query (thanks!) but uses <=
instead of NOT IN
:
DELETE FROM `test_sandbox`
WHERE id <= (
SELECT id
FROM (
SELECT id
FROM `test_sandbox`
ORDER BY id DESC
LIMIT 1 OFFSET 42 -- keep this many records
) foo
);
It uses OFFSET
to get the id of the Nth record and deletes that record and all previous records.
Since ordering is already an assumption of this problem (ORDER BY id DESC
), <=
is a perfect fit.
It is much faster, since the temporary table generated by the subquery contains just one record instead of N records.
Test case
I tested the three working methods and the new method above in two test cases.
Both test cases use 10000 existing rows, while the first test keeps 9000 (deletes the oldest 1000) and the second test keeps 50 (deletes the oldest 9950).
+-----------+------------------------+----------------------+
| | 10000 TOTAL, KEEP 9000 | 10000 TOTAL, KEEP 50 |
+-----------+------------------------+----------------------+
| NOT IN | 3.2542 seconds | 0.1629 seconds |
| NOT IN v2 | 4.5863 seconds | 0.1650 seconds |
| <=,OFFSET | 0.0204 seconds | 0.1076 seconds |
+-----------+------------------------+----------------------+
What's interesting is that the <=
method sees better performance across the board, but actually gets better the more you keep, instead of worse.
Solution 3
Unfortunately for all the answers given by other folks, you can't DELETE
and SELECT
from a given table in the same query.
DELETE FROM mytable WHERE id NOT IN (SELECT MAX(id) FROM mytable);
ERROR 1093 (HY000): You can't specify target table 'mytable' for update
in FROM clause
Nor can MySQL support LIMIT
in a subquery. These are limitations of MySQL.
DELETE FROM mytable WHERE id NOT IN
(SELECT id FROM mytable ORDER BY id DESC LIMIT 1);
ERROR 1235 (42000): This version of MySQL doesn't yet support
'LIMIT & IN/ALL/ANY/SOME subquery'
The best answer I can come up with is to do this in two stages:
SELECT id FROM mytable ORDER BY id DESC LIMIT n;
Collect the id's and make them into a comma-separated string:
DELETE FROM mytable WHERE id NOT IN ( ...comma-separated string... );
(Normally interpolating a comma-separate list into an SQL statement introduces some risk of SQL injection, but in this case the values are not coming from an untrusted source, they are known to be integer values from the database itself.)
note: Though this doesn't get the job done in a single query, sometimes a more simple, get-it-done solution is the most effective.
Solution 4
DELETE i1.*
FROM items i1
LEFT JOIN
(
SELECT id
FROM items ii
ORDER BY
id DESC
LIMIT 20
) i2
ON i1.id = i2.id
WHERE i2.id IS NULL
Solution 5
If your id is incremental then use something like
delete from table where id < (select max(id) from table)-N
user2012801
Open source projects: Java HTML compressor/minifier jQuery loading mask plugin Chrome extensions: A whole bunch Feel free to contact me at: [email protected]
Updated on July 08, 2022Comments
-
user2012801 almost 2 years
Is it possible to build a single mysql query (without variables) to remove all records from the table, except latest N (sorted by id desc)?
Something like this, only it doesn't work :)
delete from table order by id ASC limit ((select count(*) from table ) - N)
Thanks.
-
Alex Barrett about 15 yearsYou need to use an intermediary subquery to get LIMIT working in the subquery.
-
Bill Karwin about 15 yearsOkay that works -- but to me, it's inelegant and unsatisfying to have to resort to arcane tricks like that. +1 nevertheless for the answer.
-
Bill Karwin about 15 years@achinda99: I'm not seeing an answer from you on this thread...?
-
achinda99 about 15 yearsI got pulled for a meeting. My bad. I don't have a test environment right now to test the sql I wrote, but I've done both what Alex Barret did and I've gotten it to work with an inner join.
-
user2012801 about 15 yearsI mark it as an accepted answer, because it does what I asked for. But I personally will do it probably in two queries just to keep it simple :) I thought maybe there was some quick and easy way.
-
bortzmeyer about 15 yearsIt's a stupid limitation of MySQL. With PostgreSQL,
DELETE FROM mytable WHERE id NOT IN (SELECT id FROM mytable ORDER BY id DESC LIMIT 3);
works fine. -
bortzmeyer about 15 yearsOne big problem in this nice trick: serials are not always contiguous (for instance when there were rollbacks).
-
Nicole over 12 years
-
Sv1 over 11 yearsThanks Alex, your answer helped me. I see that the intermediate subquery is required but I don't understand why. Do you have an explanation for that?
-
Sebastian Breit about 11 yearsa question: what is the "foo" for?
-
codygman almost 11 yearsPerroloco, I tried without foo and got this error: ERROR 1248 (42000): Every derived table must have its own alias So theirs our answer, every derived table must have its own alias!
-
Paolo over 10 yearsThe question asked "all exept the last N records" and "in a single query". But it seem you still need a first query to count all the records in the table a then limit to total - N
-
Nitesh over 10 years@Paolo We do not require a query to count all records as the above query deletes all except last 10 records.
-
ChrisMoll over 10 yearsNo, that query deletes the 10 oldest records. The OP wants to delete everything except the n most recent records. Yours is the basic solution that would be paired with a count query, while OP is asking if there's a way to combine everything into a single query.
-
Alex Barrett over 10 yearsI'm reading this thread again 4.5 years later. Nice addition!
-
Nitesh over 10 years@ChrisMoll I agree. Shall I edit/delete this answer now to let users not down vote me or leave it as it is?
-
theJerm almost 10 yearsCool, that worked for me, but is frickin' strange that you have two sub queries and that error for such.
-
Ken Palmer about 9 yearsWow, this looks great but doesn't work in Microsoft SQL 2008. I get this message: "Incorrect syntax near 'Limit'. It's nice that it works in MySQL, but I'll need to find an alternative solution.
-
Nicole about 9 years@KenPalmer You should be able to still find a specific row offset using
ROW_NUMBER()
: stackoverflow.com/questions/603724/… -
rayryeng almost 9 yearsSome explanation as to how this works would be beneficial for those coming across this answer. Code dumping is usually not recommended.
-
AlphaG33k over 8 years@KenPalmer use SELECT TOP instead of LIMIT when switching between SQL and mySQL
-
attaboyabhipro about 8 yearsthis is the best solution I reckon !
-
Lieuwe over 7 yearsCheers for that. It reduced the query on my (very large) data set from 12 minutes to 3.64 seconds!
-
Klemen Tusar over 7 yearsNicoleC for president! :D
-
Sasino over 2 yearsIt took more than 10 minutes to run for me :(
-
Edward about 2 yearsI wonder how this could be done with a
group by