Batch delete in Postgres using a cursor

10,935

Just DELETE. Forget the cursor.

Unless your table is huge and you are deleting a large percentage of rows, there is no point in doing it in batches. And there is hardly any point even then. The only reasons I can think of would be to allow VACUUM to remove dead tuples earlier, which might help in special cases. Or to evade locking contention (possible deadlocks?), but that begs the question why rows to be deleted should be locked by concurrent transactions in the first place.

Locks are per row for this. There is no conflict with concurrent INSERT or UPDATE to different rows. (If you have UPDATEs targeting some of the same rows, you have a bigger problem.) And writers don't block readers anyway on Postgres.

You can create an SQL cursor with the WITH HOLD option and then use it with DELETE ... WHERE CURRENT OF in separate transactions. But you would have to throw in FOR UPDATE, locking all affected rows anyway. Rarely makes sense, except when you want to lock all affected rows quickly but still do something with them before deleting and there may be smarter ways ...

It can make sense to partition a big UPDATE - in separate transactions - so that dead tuples can be reused in H.O.T. updates (after a manual VACUUM or autovacuum has kicked in). But that hardly applies to DELETE operations which do not reuse space. Also, DELETE is a lot faster than UPDATE.

In the unlikely event that you still need to do it in batches, still don't use a cursor. Use something like:

WITH cte AS (
   SELECT id                 -- your PK
   FROM   tbl
   WHERE  date < $something  -- your condition
   -- ORDER BY ???           -- optional, see below
   LIMIT  50000
   FOR    UPDATE             -- SKIP LOCKED ?
   )
DELETE FROM tbl
USING  cte
WHERE  tbl.id = cte.id;

Repeat until no rows are found.

If your data is (mostly) physically sorted in a certain way, it can pay to order rows accordingly (the quoted ORDER BY). The ORDER BY imposes a cost of its own, but each DELETE may be able to access far fewer data pages with clustered rows and be faster that way. Depends on the use case; if ORDER BY can use an index, the outlook is better.

See:

Share:
10,935
shekar Roy
Author by

shekar Roy

Updated on June 04, 2022

Comments

  • shekar Roy
    shekar Roy almost 2 years

    I am new to Postgres and have a task to delete records from a live table that was created unpartitioned. Now my head is towards creating a cursor to delete the records in a controlled way.

    The steps I have in my head:

    • declare a cursor, open it
    • fetch records into the cursor where there will be a filter on date
    • pass controlled records say 5000/transaction to delete statement
    • loop until the initial fetch is completed
    • exit loop and close the cursor

    Would that make sense?

    • Alejandro
      Alejandro over 5 years
      But what's the question? Also, using a cursor in SQL is almost always a bad idea, why not a simple DELETE?
    • shekar Roy
      shekar Roy over 5 years
      a simple delete on so many rows can cause DB to hang. i need to do deletes along side writes. hence need it to be controlled deletes. if cursor is a bad idea, what else can i use?
    • Alejandro
      Alejandro over 5 years
      Why would it hang? DBs are optimized for set operations, not for cursors. Have you measured that a bulk deleted is in fact slower? If so, maybe you need to have a look at adding the proper indexes.
    • shekar Roy
      shekar Roy over 5 years
      sorry but if i got you correct. you are telling if a do a delete on the table i want to clean up using the indexed column to separate the data, DB would still be happy and delete data while constant writes are coming? additionally, i mean by a cursor as a stored function for postgres, not cursor per say. your advice is definitely welcomed