Guidance on using the WITH clause in SQL

26,666

Solution 1

If there can be concurrent write access to involved tables, there are race conditions in the following queries. Consider:


Your example can use a CTE (common table expression), but it will give you nothing a subquery couldn't do:

WITH x AS (
   SELECT  psp_id
   FROM    global.prospect
   WHERE   status IN ('new', 'reset')
   ORDER   BY request_ts
   LIMIT   1
   )
UPDATE global.prospect psp
SET    status = status || '*'
FROM   x
WHERE  psp.psp_id = x.psp_id
RETURNING psp.*;

The returned row will be the updated version.


If you want to insert the returned row into another table, that's where a WITH clause becomes essential:

WITH x AS (
   SELECT  psp_id
   FROM    global.prospect
   WHERE   status IN ('new', 'reset')
   ORDER   BY request_ts
   LIMIT   1
   )
, y AS (
   UPDATE global.prospect psp
   SET    status = status || '*'
   FROM   x
   WHERE  psp.psp_id = x.psp_id
   RETURNING psp.*
   )
INSERT INTO z
SELECT *
FROM   y;

Data-modifying queries using CTEs were added with PostgreSQL 9.1.
The manual about WITH queries (CTEs).

Solution 2

WITH lets you define "temporary tables" for use in a SELECT query. For example, I recently wrote a query like this, to calculate changes between two sets:

-- Let o be the set of old things, and n be the set of new things.
WITH o AS (SELECT * FROM things(OLD)),
     n AS (SELECT * FROM things(NEW))

-- Select both the set of things whose value changed,
-- and the set of things in the old set but not in the new set.
SELECT o.key, n.value
    FROM o
    LEFT JOIN n ON o.key = n.key
    WHERE o.value IS DISTINCT FROM n.value

UNION ALL

-- Select the set of things in the new set but not in the old set.
SELECT n.key, n.value
    FROM o
    RIGHT JOIN n ON o.key = n.key
    WHERE o.key IS NULL;

By defining the "tables" o and n at the top, I was able to avoid repeating the expressions things(OLD) and things(NEW).

Sure, we could probably eliminate the UNION ALL using a FULL JOIN, but I wasn't able to do that in my particular case.


If I understand your query correctly, it does this:

  • Find the oldest row in global.prospect whose status is 'new' or 'reset'.

  • Mark it by adding an asterisk to its status

  • Return the row (including our tweak to status).

I don't think WITH will simplify anything in your case. It may be slightly more elegant to use a FROM clause, though:

update global.prospect psp
set    status = status || '*'
from   ( select psp_id
         from   global.prospect
         where  status = 'new' or status = 'reset'
         order  by request_ts
         limit  1
       ) p2
where  psp.psp_id = p2.psp_id
returning psp.*;

Untested. Let me know if it works.

It's pretty much exactly what you have already, except:

  • This can be easily extended to update multiple rows. In your version, which uses a subquery expression, the query would fail if the subquery were changed to yield multiple rows.

  • I did not alias global.prospect in the subquery, so it's a bit easier to read. Since this uses a FROM clause, you'll get an error if you accidentally reference the table being updated.

  • In your version, the subquery expression is encountered for every single item. Although PostgreSQL should optimize this and only evaluate the expression once, this optimization will go away if you accidentally reference a column in psp or add a volatile expression.

Share:
26,666
cc young
Author by

cc young

semi-retired, living in Sihanoukville, Cambodia open for: Postgres consulting IT training in Cambodia

Updated on September 16, 2021

Comments

  • cc young
    cc young over 2 years

    I understand how to use the WITH clause for recursive queries (!!), but I'm having problems understanding its general use / power.

    For example the following query updates one record whose id is determined by using a subquery returning the id of the first record by timestamp:

    update global.prospect psp
    set    status=status||'*'
    where  psp.psp_id=(
               select  p2.psp_id
               from    global.prospect p2
               where   p2.status='new' or p2.status='reset'
               order   by p2.request_ts
               limit   1 )
    returning psp.*;
    

    Would this be a good candidate for using a WITH wrapper instead of the relatively ugly sub-query? If so, why?

  • Erwin Brandstetter
    Erwin Brandstetter over 12 years
    To update multiple rows, one would not need any form of subquery or CTE, just: UPDATE global.prospect SET status = status || '*' WHERE status IN ('new', 'reset') RETURNING *;
  • cc young
    cc young over 12 years
    wow - really good - thanks. agree with quality of pg doc, but to date CTE's have been one of those things that sound cool when I read about it, but in practice have never had a handle on it. your two example (I think!) have helps a lot