PostgreSQL - how should I use first_value()?
Solution 1
All the functions you used act on the window frame, not on the partition. If omitted the frame end is the current row. To make the window frame to be the whole partition declare it in the frame clause (range...
):
SELECT
cstamp,
price,
date_trunc('hour',cstamp) AS h,
floor(EXTRACT(minute FROM cstamp) / 5) AS m5,
min(price) OVER w,
max(price) OVER w,
first_value(price) OVER w,
last_value(price) OVER w
FROM trades
Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00'
WINDOW w AS (
PARTITION BY date_trunc('hour',cstamp) , floor(extract(minute FROM cstamp) / 5)
ORDER BY cstamp
range between unbounded preceding and unbounded following
)
ORDER BY cstamp;
Solution 2
Here's a quick query to illustrate the behaviour:
select
v,
first_value(v) over w1 f1,
first_value(v) over w2 f2,
first_value(v) over w3 f3,
last_value (v) over w1 l1,
last_value (v) over w2 l2,
last_value (v) over w3 l3,
max (v) over w1 m1,
max (v) over w2 m2,
max (v) over w3 m3,
max (v) over () m4
from (values(1),(2),(3),(4)) t(v)
window
w1 as (order by v),
w2 as (order by v rows between unbounded preceding and current row),
w3 as (order by v rows between unbounded preceding and unbounded following)
The output of the above query can be seen here (SQLFiddle here):
| V | F1 | F2 | F3 | L1 | L2 | L3 | M1 | M2 | M3 | M4 |
|---|----|----|----|----|----|----|----|----|----|----|
| 1 | 1 | 1 | 1 | 1 | 1 | 4 | 1 | 1 | 4 | 4 |
| 2 | 1 | 1 | 1 | 2 | 2 | 4 | 2 | 2 | 4 | 4 |
| 3 | 1 | 1 | 1 | 3 | 3 | 4 | 3 | 3 | 4 | 4 |
| 4 | 1 | 1 | 1 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
Few people think of the implicit frames that are applied to window functions that take an ORDER BY
clause. In this case, windows are defaulting to the frame ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. Think about it this way:
- On the row with
v = 1
the ordered window's frame spansv IN (1)
- On the row with
v = 2
the ordered window's frame spansv IN (1, 2)
- On the row with
v = 3
the ordered window's frame spansv IN (1, 2, 3)
- On the row with
v = 4
the ordered window's frame spansv IN (1, 2, 3, 4)
If you want to prevent that behaviour, you have two options:
- Use an explicit
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
clause for ordered window functions - Use no
ORDER BY
clause in those window functions that allow for omitting them (asMAX(v) OVER()
)
More details are explained in this article about LEAD()
, LAG()
, FIRST_VALUE()
and LAST_VALUE()
Solution 3
The result of max()
as window function is base on the frame definition.
The default frame definition (with ORDER BY
) is from the start of the frame up to the last peer of the current row (including the current row and possibly more rows ranking equally according to ORDER BY
). In the absence of ORDER BY
(like in my answer you are referring to), or if ORDER BY
treats every row in the partition as equal (like in your first example), all rows in the partition are peers, and max()
produces the same result for every row in the partition, effectively considering all rows of the partition.
The default framing option is
RANGE UNBOUNDED PRECEDING
, which is the same asRANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
. WithORDER BY
, this sets the frame to be all rows from the partition start up through the current row's last peer. WithoutORDER BY
, all rows of the partition are included in the window frame, since all rows become peers of the current row.
Bold emphasis mine.
The simple solution would be to omit the ORDER BY
in the window definition - just like I demonstrated in the example you are referring to.
All the gory details about frame specifications in the chapter Window Function Calls in the manual.
Comments
-
mikeramos almost 2 years
This answer to shows how to produce High/Low/Open/Close values from a ticker:
Retrieve aggregates for arbitrary time intervalsI am trying to implement a solution based on this (PG 9.2), but am having difficulty in getting the correct value for
first_value()
.So far, I have tried two queries:
SELECT cstamp, price, date_trunc('hour',cstamp) AS h, floor(EXTRACT(minute FROM cstamp) / 5) AS m5, min(price) OVER w, max(price) OVER w, first_value(price) OVER w, last_value(price) OVER w FROM trades Where date_trunc('hour',cstamp) = timestamp '2013-03-29 09:00:00' WINDOW w AS ( PARTITION BY date_trunc('hour',cstamp), floor(extract(minute FROM cstamp) / 5) ORDER BY date_trunc('hour',cstamp) ASC, floor(extract(minute FROM cstamp) / 5) ASC ) ORDER BY cstamp;
Here's a piece of the result:
cstamp price h m5 min max first last "2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000 "2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000 "2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000 "2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.80000;77.00000 "2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;73.99004;77.80000;73.99004;73.99004
As you can see, 77.8 is not what I believe is the correct value for
first_value()
, which should be 77.0.I though this might be due to the ambiguous
ORDER BY
in theWINDOW
, so I changed this toORDER BY cstamp ASC
but this appears to upset the
PARTITION
as well:cstamp price h m5 min max first last "2013-03-29 09:19:14";77.00000;"2013-03-29 09:00:00";3;77.00000;77.00000;77.00000;77.00000 "2013-03-29 09:26:18";77.00000;"2013-03-29 09:00:00";5;77.00000;77.00000;77.00000;77.00000 "2013-03-29 09:29:41";77.80000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.80000 "2013-03-29 09:29:51";77.00000;"2013-03-29 09:00:00";5;77.00000;77.80000;77.00000;77.00000 "2013-03-29 09:30:04";77.00000;"2013-03-29 09:00:00";6;77.00000;77.00000;77.00000;77.00000
since the values for max and last now vary within the partition.
What am I doing wrong? Could someone help me better to understand the relation between
PARTITION
andORDER
within aWINDOW
?
Although I have an answer, here's a trimmed-down pg_dump which will allow anyone to recreate the table. The only thing that's different is the table name.
CREATE TABLE wtest ( cstamp timestamp without time zone, price numeric(10,5) ); COPY wtest (cstamp, price) FROM stdin; 2013-03-29 09:04:54 77.80000 2013-03-29 09:04:50 76.98000 2013-03-29 09:29:51 77.00000 2013-03-29 09:29:41 77.80000 2013-03-29 09:26:18 77.00000 2013-03-29 09:19:14 77.00000 2013-03-29 09:19:10 77.00000 2013-03-29 09:33:50 76.00000 2013-03-29 09:33:46 76.10000 2013-03-29 09:33:15 77.79000 2013-03-29 09:30:08 77.80000 2013-03-29 09:30:04 77.00000 \.
-
Clodoaldo Neto about 11 yearsWithout an order by, how to have a first and last with a real meaning?
-
Erwin Brandstetter about 11 years@ClodoaldoNeto: Those are arbitrary picks then. Depending on what the OP wants to achieve, a frame definition like in your answer, or a separate window definition or just arbitrary picks (no
ORDER BY
like I suggest) would be the way to go. -
mikeramos about 11 years@ErwinBrandstetter Yes, I really wanted first and last in time as per the cstamp column. Thank you for your help.
-
mikeramos about 11 yearsThank you very much. As you were answering, I came across the frame clause and it does, indeed, fix things. What a day!
-
Lukas Eder over 9 years"The default frame definition is from the start of the frame up to the last peer of the current row" - I think that's not entirely correct. It would be better to say that ordered windows have a default frame of
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
, i.e. the current row is included. See also the example in my answer -
Erwin Brandstetter over 9 years@LukasEder: The default changes subtly with the presence of an
ORDER BY
clause. I clarified and added a quote from the manual. -
Lukas Eder over 9 yearsHmm, interesting, thanks for the update. I'm still not 100% convinced that the wording is correct (or at least, clear). It sounds as though the current row was excluded from the frame by default, which would correspond to
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING
, but that's obviously not the case -
Erwin Brandstetter over 9 years@LukasEder: I certainly didn't want to imply the current row was excluded. The "last peer of the current row" always includes the current row. I added a bit to avoid misunderstandings.
-
Erwin Brandstetter over 9 yearsVery nice demo! Maybe name the added
max(v) over ()
m4
to keep the symmetry? -
Lukas Eder over 9 years@ErwinBrandstetter: Eh... ;-) Well, fair enough
-
Erwin Brandstetter over 9 yearsWe wouldn't want my .. uhm .. anybody's OCD acting up, now would we? :)
-
Lukas Eder over 9 years@ErwinBrandstetter: No, I can feel with you. I am the same (mostly) :)
-
Lukas Eder over 9 years... And thanks for that edit ;-) I had updated the SQLFiddle, but forgot to update the link...