SELECTing top N rows without ROWNUM?

29,165

Solution 1

Since this is homework, a hint rather than an answer. You'll want to use analytic functions. ROW_NUMBER, RANK, or DENSE_RANK can work depending on how you want to handle ties.

If analytic functions are also disallowed, the other option I could imagine-- one that you would never, ever, ever actually write in practice, would be something like

SELECT name, salary
  FROM staff s1
 WHERE (SELECT COUNT(*)
          FROM staff s2
         WHERE s1.salary < s2.salary) <= 3

With regard to performance, I wouldn't rely on the COST number from the query plan-- that's only an estimate and it is not generally possible to compare the cost between plans for different SQL statements. You're much better off looking at something like the number of consistent gets the query actually does and considering how the query performance will scale as the number of rows in the table increases. The third option is going to be radically less efficient than the other two simply because it needs to scan the STAFF table twice.

I don't have your STAFF table, so I'll use the EMP table from the SCOTT schema

The analytic function solution actually does 7 consistent gets as does the ROWNUM solution

Wrote file afiedt.buf

  1  select ename, sal
  2    from( select ename,
  3                 sal,
  4                 rank() over (order by sal) rnk
  5            from emp )
  6*  where rnk <= 3
SQL> /

ENAME             SAL
---------- ----------
smith             800
SM0               950
ADAMS            1110


Execution Plan
----------------------------------------------------------
Plan hash value: 3291446077

--------------------------------------------------------------------------------
-
| Id  | Operation                | Name | Rows  | Bytes | Cost (%CPU)| Time
|
--------------------------------------------------------------------------------
-
|   0 | SELECT STATEMENT         |      |    14 |   672 |     4  (25)| 00:00:01
|*  1 |  VIEW                    |      |    14 |   672 |     4  (25)| 00:00:01
|*  2 |   WINDOW SORT PUSHED RANK|      |    14 |   140 |     4  (25)| 00:00:01
|   3 |    TABLE ACCESS FULL     | EMP  |    14 |   140 |     3   (0)| 00:00:01
--------------------------------------------------------------------------------
-

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("RNK"<=3)
   2 - filter(RANK() OVER ( ORDER BY "SAL")<=3)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        668  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          3  rows processed

SQL> select ename, sal
  2    from( select ename, sal
  3            from emp
  4           order by sal )
  5   where rownum <= 3;

ENAME             SAL
---------- ----------
smith             800
SM0               950
ADAMS            1110


Execution Plan
----------------------------------------------------------
Plan hash value: 1744961472

--------------------------------------------------------------------------------
| Id  | Operation               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------
|   0 | SELECT STATEMENT        |      |     3 |   105 |     4  (25)| 00:00:01 |
|*  1 |  COUNT STOPKEY          |      |       |       |            |          |
|   2 |   VIEW                  |      |    14 |   490 |     4  (25)| 00:00:01 |
|*  3 |    SORT ORDER BY STOPKEY|      |    14 |   140 |     4  (25)| 00:00:01 |
|   4 |     TABLE ACCESS FULL   | EMP  |    14 |   140 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------------


Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(ROWNUM<=3)
   3 - filter(ROWNUM<=3)


Statistics
----------------------------------------------------------
          1  recursive calls
          0  db block gets
          7  consistent gets
          0  physical reads
          0  redo size
        668  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
          3  rows processed

The COUNT(*) solution, however, actually does 99 consistent gets and has to do a full scan of the table twice so it is more than 10 times less efficient. And it will scale much worse as the number of rows in the table increases

SQL> select ename, sal
  2    from emp e1
  3   where (select count(*) from emp e2 where e1.sal < e2.sal) <= 3;

ENAME             SAL
---------- ----------
JONES            2975
SCOTT            3000
KING             5000
FORD             3000
FOO


Execution Plan
----------------------------------------------------------
Plan hash value: 2649664444

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    14 |   140 |    24   (0)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL | EMP  |    14 |   140 |     3   (0)| 00:00:01 |
|   3 |   SORT AGGREGATE    |      |     1 |     4 |            |          |
|*  4 |    TABLE ACCESS FULL| EMP  |     1 |     4 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter( (SELECT COUNT(*) FROM "EMP" "E2" WHERE
              "E2"."SAL">:B1)<=3)
   4 - filter("E2"."SAL">:B1)


Statistics
----------------------------------------------------------
          0  recursive calls
          0  db block gets
         99  consistent gets
          0  physical reads
          0  redo size
        691  bytes sent via SQL*Net to client
        524  bytes received via SQL*Net from client
          2  SQL*Net roundtrips to/from client
          0  sorts (memory)
          0  sorts (disk)
          5  rows processed

Solution 2

The reason you must wrap the statement with another select is because the outer select statement is the one that limits your result set to the row numbers desired. Here's a helpful link on analytics. If you run the inner select by itself you'll see why you have to do this. Analytics are applied AFTER the where clause is evaluated, which is why you get the error that myorder is an invalid identifier.

Solution 3

Oracle? What about window functions?

select * from 
(SELECT s.*, row_number over (order by salary desc ) as rn FROM staff s )
where rn <=3
Share:
29,165
Pew
Author by

Pew

Updated on July 16, 2022

Comments

  • Pew
    Pew almost 2 years

    I hope you can help me with my homework :)

    We need to build a query that outputs the top N best paid employees.

    My version works perfectly fine.
    For example the top 3:

    SELECT name, salary
    FROM staff
    WHERE salary IN ( SELECT * 
                      FROM ( SELECT salary
                             FROM staff 
                             ORDER BY salary DESC ) 
                      WHERE ROWNUM <= 3 )
    ORDER BY salary DESC
    ;
    

    Note that this will output employees that are in the top 3 and have the same salary, too.

    1: Mike, 4080
    2: Steve, 2800
    2: Susan, 2800
    2: Jack, 2800
    3: Chloe, 1400


    But now our teacher does not allow us to use ROWNUM.
    I searched far and wide and didn't find anything useable.


    My second solution thanks to Justin Caves' hint.

    First i tried this:

    SELECT name, salary, ( rank() OVER ( ORDER BY salary DESC ) ) as myorder
    FROM staff
    WHERE myorder <= 3
    ;
    

    The errormessage is: "myorder: invalid identifier"

    Thanks to DCookie its now clear:

    "[...] Analytics are applied AFTER the where clause is evaluated, which is why you get the error that myorder is an invalid identifier."

    Wrapping a SELECT around solves this:

    SELECT *
    FROM ( SELECT name, salary, rank() OVER ( ORDER BY salary DESC ) as myorder FROM staff )
    WHERE myorder <= 3
    ;
    

    My teacher strikes again and don't allow such exotic analytic functions.

    3rd solution from @Justin Caves.

    "If analytic functions are also disallowed, the other option I could imagine-- one that you would never, ever, ever actually write in practice, would be something like"

    SELECT name, salary
      FROM staff s1
     WHERE (SELECT COUNT(*)
              FROM staff s2
             WHERE s1.salary < s2.salary) <= 3
    
  • zerkms
    zerkms about 13 years
    You did not apply any window functions.
  • Andrey Frolov
    Andrey Frolov about 13 years
    Ok, ok. Analytic function. Doesn't matter.
  • Pew
    Pew about 13 years
    Yep, i looked up RANK() and now i have found my second solution.
  • Pew
    Pew about 13 years
    I didn't try this because i was looking up for RANK because of @JustinCave's hint. But now i have the same solution. But why must i wrap a SELECT around it to use the rn value?
  • Pew
    Pew about 13 years
    @Justin I showed my teacher the second solution. Now he doesn't allow such exotic analytic functions, too. Do you have second hint for me?
  • Pew
    Pew about 13 years
    Thanks @Justin, that solution is quite unexpected and I still don't understand it. Going to review it today at home.
  • Pew
    Pew about 13 years
    Now i got it. Isn't better than my first solution? Costs: Solution 1: 8; Solution 2: 4; Solution 3: 6
  • Justin Cave
    Justin Cave about 13 years
    @Pew - Updated with a more detailed discussion of the performance implications.
  • Pew
    Pew about 13 years
    As always I thank for your help and information. I learned a lot.
  • vefthym
    vefthym about 10 years
    Explaining why this is the answer, would be also useful for the OP.
  • russ
    russ about 10 years
    Yes ok and apologies for the code. Just thinking of an explanation
  • russ
    russ about 10 years
    Any time you use "Top N" in a SELECT statement it works depending on the "ORDER BY" clause. In the WHERE clause, the part in brackets orders by descending salary so the top three of that are the highest salaries. Because the salaries are grouped together the statement doesn't care how many times they might appear. I'm then asking for any names which have that salary amount.
  • Jon Heller
    Jon Heller about 10 years
    This is not valid syntax for Oracle.
  • Rahil Wazir
    Rahil Wazir about 10 years
    @russ Edit to your to post instead of commenting/