PostgreSQL - how should I use first_value()?

25,816

Solution 1

SQL Fiddle

All the functions you used act on the window frame, not on the partition. If omitted the frame end is the current row. To make the window frame to be the whole partition declare it in the frame clause (range...):

SELECT  
    cstamp,
    price,
    date_trunc('hour',cstamp) AS h,
    floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
    min(price) OVER w,
    max(price) OVER w,
    first_value(price) OVER w,
    last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
    PARTITION BY date_trunc('hour',cstamp) , floor(extract(minute FROM cstamp) / 5)
    ORDER BY cstamp
    range between unbounded preceding and unbounded following
    )
ORDER BY cstamp;

Solution 2

Here's a quick query to illustrate the behaviour:

select 
  v,
  first_value(v) over w1 f1,
  first_value(v) over w2 f2,
  first_value(v) over w3 f3,
  last_value (v) over w1 l1,
  last_value (v) over w2 l2,
  last_value (v) over w3 l3,
  max        (v) over w1 m1,
  max        (v) over w2 m2,
  max        (v) over w3 m3,
  max        (v) over () m4
from (values(1),(2),(3),(4)) t(v)
window
  w1 as (order by v),
  w2 as (order by v rows between unbounded preceding and current row),
  w3 as (order by v rows between unbounded preceding and unbounded following)

The output of the above query can be seen here (SQLFiddle here):

| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 |
|---|----|----|----|----|----|----|----|----|----|----|
| 1 |  1 |  1 |  1 |  1 |  1 |  4 |  1 |  1 |  4 |  4 |
| 2 |  1 |  1 |  1 |  2 |  2 |  4 |  2 |  2 |  4 |  4 |
| 3 |  1 |  1 |  1 |  3 |  3 |  4 |  3 |  3 |  4 |  4 |
| 4 |  1 |  1 |  1 |  4 |  4 |  4 |  4 |  4 |  4 |  4 |

Few people think of the implicit frames that are applied to window functions that take an ORDER BY clause. In this case, windows are defaulting to the frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Think about it this way:

  • On the row with v = 1 the ordered window's frame spans v IN (1)
  • On the row with v = 2 the ordered window's frame spans v IN (1, 2)
  • On the row with v = 3 the ordered window's frame spans v IN (1, 2, 3)
  • On the row with v = 4 the ordered window's frame spans v IN (1, 2, 3, 4)

If you want to prevent that behaviour, you have two options:

  • Use an explicit ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING clause for ordered window functions
  • Use no ORDER BY clause in those window functions that allow for omitting them (as MAX(v) OVER())

More details are explained in this article about LEAD(), LAG(), FIRST_VALUE() and LAST_VALUE()

Solution 3

The result of max() as window function is base on the frame definition.

The default frame definition (with ORDER BY) is from the start of the frame up to the last peer of the current row (including the current row and possibly more rows ranking equally according to ORDER BY). In the absence of ORDER BY (like in my answer you are referring to), or if ORDER BY treats every row in the partition as equal (like in your first example), all rows in the partition are peers, and max() produces the same result for every row in the partition, effectively considering all rows of the partition.

Per documentation:

The default framing option is RANGE UNBOUNDED PRECEDING, which is the same as RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. With ORDER BY, this sets the frame to be all rows from the partition start up through the current row's last peer. Without ORDER BY, all rows of the partition are included in the window frame, since all rows become peers of the current row.

Bold emphasis mine.

The simple solution would be to omit the ORDER BY in the window definition - just like I demonstrated in the example you are referring to.

All the gory details about frame specifications in the chapter Window Function Calls in the manual.

Share:
25,816
mikeramos
Author by

mikeramos

Old-ish IT Geezer, young at heart, memoir fanboy

Updated on July 13, 2022

Comments

  • mikeramos
    mikeramos almost 2 years

    This answer to shows how to produce High/Low/Open/Close values from a ticker:
    Retrieve aggregates for arbitrary time intervals

    I am trying to implement a solution based on this (PG 9.2), but am having difficulty in getting the correct value for first_value().

    So far, I have tried two queries:

    SELECT  
        cstamp,
        price,
        date_trunc('hour',cstamp) AS h,
        floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
        min(price) OVER w,
        max(price) OVER w,
        first_value(price) OVER w,
        last_value(price) OVER w
    FROM trades
    Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
    WINDOW w AS (
        PARTITION BY date_trunc('hour',cstamp), floor(extract(minute FROM cstamp) / 5)
        ORDER BY date_trunc('hour',cstamp) ASC, floor(extract(minute FROM cstamp) / 5) ASC
        )
    ORDER BY cstamp;
    

    Here's a piece of the result:

            cstamp         price      h                 m5  min      max      first    last
    "2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000
    
    "2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
    "2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
    "2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000
    
    "2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;73.99004;77.80000;73.99004;73.99004
    

    As you can see, 77.8 is not what I believe is the correct value for first_value(), which should be 77.0.

    I though this might be due to the ambiguous ORDER BY in the WINDOW, so I changed this to

    ORDER BY cstamp ASC 
    

    but this appears to upset the PARTITION as well:

            cstamp         price      h                 m5  min      max      first    last
    "2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000
    
    "2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.00000;77.00000;77.00000
    "2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.80000
    "2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.00000
    
    "2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;77.00000;77.00000;77.00000;77.00000
    

    since the values for max and last now vary within the partition.

    What am I doing wrong? Could someone help me better to understand the relation between PARTITION and ORDER within a WINDOW?


    Although I have an answer, here's a trimmed-down pg_dump which will allow anyone to recreate the table. The only thing that's different is the table name.

    CREATE TABLE wtest (
        cstamp timestamp without time zone,
        price numeric(10,5)
    );
    
    COPY wtest (cstamp, price) FROM stdin;
    2013-03-29 09:04:54 77.80000
    2013-03-29 09:04:50 76.98000
    2013-03-29 09:29:51 77.00000
    2013-03-29 09:29:41 77.80000
    2013-03-29 09:26:18 77.00000
    2013-03-29 09:19:14 77.00000
    2013-03-29 09:19:10 77.00000
    2013-03-29 09:33:50 76.00000
    2013-03-29 09:33:46 76.10000
    2013-03-29 09:33:15 77.79000
    2013-03-29 09:30:08 77.80000
    2013-03-29 09:30:04 77.00000
    \.
    
  • Clodoaldo Neto
    Clodoaldo Neto about 11 years
    Without an order by, how to have a first and last with a real meaning?
  • Erwin Brandstetter
    Erwin Brandstetter about 11 years
    @ClodoaldoNeto: Those are arbitrary picks then. Depending on what the OP wants to achieve, a frame definition like in your answer, or a separate window definition or just arbitrary picks (no ORDER BY like I suggest) would be the way to go.
  • mikeramos
    mikeramos about 11 years
    @ErwinBrandstetter Yes, I really wanted first and last in time as per the cstamp column. Thank you for your help.
  • mikeramos
    mikeramos about 11 years
    Thank you very much. As you were answering, I came across the frame clause and it does, indeed, fix things. What a day!
  • Lukas Eder
    Lukas Eder over 9 years
    "The default frame definition is from the start of the frame up to the last peer of the current row" - I think that's not entirely correct. It would be better to say that ordered windows have a default frame of ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW, i.e. the current row is included. See also the example in my answer
  • Erwin Brandstetter
    Erwin Brandstetter over 9 years
    @LukasEder: The default changes subtly with the presence of an ORDER BY clause. I clarified and added a quote from the manual.
  • Lukas Eder
    Lukas Eder over 9 years
    Hmm, interesting, thanks for the update. I'm still not 100% convinced that the wording is correct (or at least, clear). It sounds as though the current row was excluded from the frame by default, which would correspond to ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING, but that's obviously not the case
  • Erwin Brandstetter
    Erwin Brandstetter over 9 years
    @LukasEder: I certainly didn't want to imply the current row was excluded. The "last peer of the current row" always includes the current row. I added a bit to avoid misunderstandings.
  • Erwin Brandstetter
    Erwin Brandstetter over 9 years
    Very nice demo! Maybe name the added max(v) over () m4 to keep the symmetry?
  • Lukas Eder
    Lukas Eder over 9 years
    @ErwinBrandstetter: Eh... ;-) Well, fair enough
  • Erwin Brandstetter
    Erwin Brandstetter over 9 years
    We wouldn't want my .. uhm .. anybody's OCD acting up, now would we? :)
  • Lukas Eder
    Lukas Eder over 9 years
    @ErwinBrandstetter: No, I can feel with you. I am the same (mostly) :)
  • Lukas Eder
    Lukas Eder over 9 years
    ... And thanks for that edit ;-) I had updated the SQLFiddle, but forgot to update the link...