Batch delete in Postgres using a cursor
Just DELETE
. Forget the cursor.
Unless your table is huge and you are deleting a large percentage of rows, there is no point in doing it in batches. And there is hardly any point even then. The only reasons I can think of would be to allow VACUUM
to remove dead tuples earlier, which might help in special cases. Or to evade locking contention (possible deadlocks?), but that begs the question why rows to be deleted should be locked by concurrent transactions in the first place.
Locks are per row for this. There is no conflict with concurrent INSERT
or UPDATE
to different rows. (If you have UPDATEs targeting some of the same rows, you have a bigger problem.) And writers don't block readers anyway on Postgres.
You can create an SQL cursor with the WITH HOLD
option and then use it with DELETE ... WHERE CURRENT OF
in separate transactions. But you would have to throw in FOR UPDATE
, locking all affected rows anyway. Rarely makes sense, except when you want to lock all affected rows quickly but still do something with them before deleting and there may be smarter ways ...
It can make sense to partition a big UPDATE
- in separate transactions - so that dead tuples can be reused in H.O.T. updates (after a manual VACUUM
or autovacuum
has kicked in). But that hardly applies to DELETE
operations which do not reuse space. Also, DELETE
is a lot faster than UPDATE
.
In the unlikely event that you still need to do it in batches, still don't use a cursor. Use something like:
WITH cte AS (
SELECT id -- your PK
FROM tbl
WHERE date < $something -- your condition
-- ORDER BY ??? -- optional, see below
LIMIT 50000
FOR UPDATE -- SKIP LOCKED ?
)
DELETE FROM tbl
USING cte
WHERE tbl.id = cte.id;
Repeat until no rows are found.
If your data is (mostly) physically sorted in a certain way, it can pay to order rows accordingly (the quoted ORDER BY
). The ORDER BY
imposes a cost of its own, but each DELETE
may be able to access far fewer data pages with clustered rows and be faster that way. Depends on the use case; if ORDER BY
can use an index, the outlook is better.
See:
shekar Roy
Updated on June 04, 2022Comments
-
shekar Roy almost 2 years
I am new to Postgres and have a task to delete records from a live table that was created unpartitioned. Now my head is towards creating a cursor to delete the records in a controlled way.
The steps I have in my head:
- declare a cursor, open it
- fetch records into the cursor where there will be a filter on date
- pass controlled records say 5000/transaction to delete statement
- loop until the initial fetch is completed
- exit loop and close the cursor
Would that make sense?
-
Alejandro over 5 yearsBut what's the question? Also, using a cursor in SQL is almost always a bad idea, why not a simple
DELETE
? -
shekar Roy over 5 yearsa simple delete on so many rows can cause DB to hang. i need to do deletes along side writes. hence need it to be controlled deletes. if cursor is a bad idea, what else can i use?
-
Alejandro over 5 yearsWhy would it hang? DBs are optimized for set operations, not for cursors. Have you measured that a bulk deleted is in fact slower? If so, maybe you need to have a look at adding the proper indexes.
-
shekar Roy over 5 yearssorry but if i got you correct. you are telling if a do a delete on the table i want to clean up using the indexed column to separate the data, DB would still be happy and delete data while constant writes are coming? additionally, i mean by a cursor as a stored function for postgres, not cursor per say. your advice is definitely welcomed