Why no windowed functions in where clauses?

52,676

Solution 1

why can't I use a windowed function in a where clause in SQL Server?

One answer, though not particularly informative, is because the spec says that you can't.

See the article by Itzik Ben Gan - Logical Query Processing: What It Is And What It Means to You and in particular the image here. Window functions are evaluated at the time of the SELECT on the result set remaining after all the WHERE/JOIN/GROUP BY/HAVING clauses have been dealt with (step 5.1).

really I'm looking for the reasoning behind not being able to use windowing functions in where clauses.

The reason that they are not allowed in the WHERE clause is that it would create ambiguity. Stealing Itzik Ben Gan's example from High-Performance T-SQL Using Window Functions (p.25)

Suppose your table was

CREATE TABLE T1
(
col1 CHAR(1) PRIMARY KEY
)

INSERT INTO T1 VALUES('A'),('B'),('C'),('D'),('E'),('F')

And your query

SELECT col1
FROM T1
WHERE ROW_NUMBER() OVER (ORDER BY col1) <= 3
AND col1 > 'B'

What would be the right result? Would you expect that the col1 > 'B' predicate ran before or after the row numbering?

Solution 2

There is no need for CTE, just use the windowing function in a subquery:

select id, sales_person_id, product_type, product_id, sale_amount
from
(
  select id, sales_person_id, product_type, product_id, sale_amount,
    row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc) rn
  from Sales_Log
) sl
where rn = 1

Edit, moving my comment to the answer.

Windowing functions are not performed until the data is actually selected which is after the WHERE clause. So if you try to use a row_number in a WHERE clause the value is not yet assigned.

Solution 3

"All-at-once operation" means that all expressions in the same logical query process phase are evaluated logically at the same time.

And great chapter Impact on Window Functions:

Suppose you have:

CREATE TABLE #Test ( Id INT) ;
 
INSERT  INTO #Test VALUES  ( 1001 ), ( 1002 ) ;

SELECT Id
FROM #Test
WHERE Id = 1002
  AND ROW_NUMBER() OVER(ORDER BY Id) = 1;

All-at-Once operations tell us these two conditions evaluated logically at the same point of time. Therefore, SQL Server can evaluate conditions in WHERE clause in arbitrary order, based on estimated execution plan. So the main question here is which condition evaluates first.

Case 1:

If ( Id = 1002 ) is first, then if ( ROW_NUMBER() OVER(ORDER BY Id) = 1 )

Result: 1002

Case 2:

If ( ROW_NUMBER() OVER(ORDER BY Id) = 1 ), then check if ( Id = 1002 )

Result: empty

So we have a paradox.

This example shows why we cannot use Window Functions in WHERE clause. You can think more about this and find why Window Functions are allowed to be used just in SELECT and ORDER BY clauses!


Addendum

Teradata supports QUALIFY clause:

Filters results of a previously computed ordered analytical function according to user‑specified search conditions.

SELECT Id
FROM #Test
WHERE Id = 1002
QUALIFY ROW_NUMBER() OVER(ORDER BY Id) = 1;

Addendum 2:

Snowflake - Qualify

QUALIFY does with window functions what HAVING does with aggregate functions and GROUP BY clauses.

In the execution order of a query, QUALIFY is therefore evaluated after window functions are computed. Typically, a SELECT statement’s clauses are evaluated in the order shown below:

From

    Where

    Group by

    Having

    Window

    QUALIFY

    Distinct

    Order by

    Limit

Solution 4

You don't necessarily need to use a CTE, you can query the result set after using row_number()

select row, id, sales_person_id, product_type, product_id, sale_amount
from (
    select
        row_number() over(partition by sales_person_id, 
            product_type, product_id order by sale_amount desc) AS row,
        id, sales_person_id, product_type, product_id, sale_amount
    from Sales_Log 
    ) a
where row = 1

Solution 5

It's an old thread, but I'll try to answer specifically the question expressed in the topic.

Why no windowed functions in where clauses?

SELECT statement has following main clauses specified in keyed-in order:

SELECT DISTINCT TOP list
FROM  JOIN ON / APPLY / PIVOT / UNPIVOT
WHERE
GROUP BY  WITH CUBE / WITH ROLLUP
HAVING
ORDER BY
OFFSET-FETCH

Logical Query Processing Order, or Binding Order, is conceptual interpretation order, it defines the correctness of the query. This order determines when the objects defined in one step are made available to the clauses in subsequent steps.

----- Relational result
  1. FROM
    1.1. ON JOIN / APPLY / PIVOT / UNPIVOT
  2. WHERE
  3. GROUP BY
    3.1. WITH CUBE / WITH ROLLUP
  4. HAVING
  ---- After the HAVING step the Underlying Query Result is ready
  5. SELECT
    5.1. SELECT list
    5.2. DISTINCT
----- Relational result

----- Non-relational result (a cursor)
  6. ORDER BY
  7. TOP / OFFSET-FETCH
----- Non-relational result (a cursor)

For example, if the query processor can bind to (access) the tables or views defined in the FROM clause, these objects and their columns are made available to all subsequent steps.

Conversely, all clauses preceding the SELECT clause cannot reference any column aliases or derived columns defined in SELECT clause. However, those columns can be referenced by subsequent clauses such as the ORDER BY clause.

OVER clause determines the partitioning and ordering of a row set before the associated window function is applied. That is, the OVER clause defines a window or user-specified set of rows within an Underlying Query Result set and window function computes result against that window.

Msg 4108, Level 15, State 1, …
Windowed functions can only appear in the SELECT or ORDER BY clauses.

The reason behind is because the way how Logical Query Processing works in T-SQL. Since the underlying query result is established only when logical query processing reaches the SELECT step 5.1. (that is, after processing the FROM, WHERE, GROUP BY and HAVING steps), window functions are allowed only in the SELECT and ORDER BY clauses of the query.

Note to mention, window functions are still part of relational layer even Relational Model doesn't deal with ordered data. The result after the SELECT step 5.1. with any window function is still relational.

Also, speaking strictly, the reason why window function are not allowed in the WHERE clause is not because it would create ambiguity, but because the order how Logical Query Processing processes SELECT statement in T-SQL.

Links: here, here and here

Share:
52,676
Chris Pfohl
Author by

Chris Pfohl

Full-stack, stack-agnostic, problem solver. I love data, great developer experience, keeping the big picture in mind and developing people-proof systems (mostly me-proof, if I'm being honest).

Updated on October 05, 2020

Comments

  • Chris Pfohl
    Chris Pfohl over 3 years

    Title says it all, why can't I use a windowed function in a where clause in SQL Server?

    This query makes perfect sense:

    select id, sales_person_id, product_type, product_id, sale_amount
    from Sales_Log
    where 1 = row_number() over(partition by sales_person_id, product_type, product_id order by sale_amount desc)
    

    But it doesn't work. Is there a better way than a CTE/Subquery?

    EDIT

    For what its worth this is the query with a CTE:

    with Best_Sales as (
        select id, sales_person_id, product_type, product_id, sale_amount, row_number() over (partition by sales_person_id, product_type, product_id order by sales_amount desc) rank
        from Sales_log
    )
    select id, sales_person_id, product_type, product_id, sale_amount
    from Best_Sales
    where rank = 1
    

    EDIT

    +1 for the answers showing with a subquery, but really I'm looking for the reasoning behind not being able to use windowing functions in where clauses.

  • Chris Pfohl
    Chris Pfohl over 11 years
    +1 for answering the question, but not quite what I was looking for...shall I ask again, correctly this time?
  • Chris Pfohl
    Chris Pfohl over 11 years
    +1 for answering the question, but not quite what I was looking for...shall I ask again, correctly this time?
  • Taryn
    Taryn over 11 years
    @ChristopherPfohl based on my understanding the row_number is not assigned until the records are selected so you cannot have it in the WHERE clause because the value does not exist yet.
  • Chris Pfohl
    Chris Pfohl over 11 years
    Apologies for giving and taking away, but the answer below is even closer to the answer I was looking for.
  • Taryn
    Taryn over 11 years
    @ChristopherPfohl it is basically the same thing I said in my comment, but you get to decide what answer to accept. :)
  • Mahmoud Gamal
    Mahmoud Gamal over 11 years
    Is this is the same reason they are not allowed in the GROUP BY clause??
  • Martin Smith
    Martin Smith over 11 years
    @MahmoudGamal The window operated on in the group by would need to be different than the window used in the select as that is on the result after grouping and having. Suppose you could define it as the rows logically present before the group by but Just would be quite confusing to allow that I think.
  • Ravi
    Ravi almost 6 years
    This seems lazy. We have rules like 'where' and 'group' runs before select and order by runs after select. Why not just choose a rule like that?
  • Ravi
    Ravi almost 6 years
    this is not the question. Sure there are ways around it, but the question is why isn't it allowed in where.
  • Martin Smith
    Martin Smith almost 6 years
    @RaviR they have chosen a rule like that. window functions run after virtually everything else. this is why you cant use them in the WHERE
  • Taryn
    Taryn almost 6 years
    @RaviR which is in fact answered in my edit to the answer. The row_number isn’t evaluated until after the where clause.