How can I speed up row_number in Oracle?

22,070

Solution 1

ROW_NUMBER is quite inefficient in Oracle.

See the article in my blog for performance details:

For your specific query, I'd recommend you to replace it with ROWNUM and make sure that the index is used:

SELECT  *
FROM    (
        SELECT  /*+ INDEX_ASC(t index_on_column) NOPARALLEL_INDEX(t index_on_column) */
                t.*, ROWNUM AS rn
        FROM    table t
        ORDER BY
                column
        )
WHERE rn >= :start
      AND rownum <= :end - :start + 1

This query will use COUNT STOPKEY

Also either make sure you column is not nullable, or add WHERE column IS NOT NULL condition.

Otherwise the index cannot be used to retrieve all values.

Note that you cannot use ROWNUM BETWEEN :start and :end without a subquery.

ROWNUM is always assigned last and checked last, that's way ROWNUM's always come in order without gaps.

If you use ROWNUM BETWEEN 10 and 20, the first row that satisifies all other conditions will become a candidate for returning, temporarily assigned with ROWNUM = 1 and fail the test of ROWNUM BETWEEN 10 AND 20.

Then the next row will be a candidate, assigned with ROWNUM = 1 and fail, etc., so, finally, no rows will be returned at all.

This should be worked around by putting ROWNUM's into the subquery.

Solution 2

Looks like a pagination query to me.

From this ASKTOM article (about 90% down the page):

You need to order by something unique for these pagination queries, so that ROW_NUMBER is assigned deterministically to the rows each and every time.

Also your queries are no where near the same so I'm not sure what the benefit of comparing the costs of one to the other is.

Solution 3

Part of the problem is how big is the 'start' to 'end' span and where they 'live'. Say you have a million rows in the table, and you want rows 567,890 to 567,900 then you are going to have to live with the fact that it is going to need to go through the entire table, sort pretty much all of that by id, and work out what rows fall into that range.

In short, that's a lot of work, which is why the optimizer gives it a high cost.

It is also not something an index can help with much. An index would give the order, but at best, that gives you somewhere to start and then you keep reading on until you get to the 567,900th entry.

If you are showing your end user 10 items at a time, it may be worth actually grabbing the top 100 from the DB, then having the app break that 100 into ten chunks.

Solution 4

Is your ORDER BY column indexed? If not that's a good place to start.

Share:
22,070
Jason Baker
Author by

Jason Baker

I'm a developer on Google's Cloud Console.

Updated on July 06, 2022

Comments

  • Jason Baker
    Jason Baker almost 2 years

    I have a SQL query that looks something like this:

    SELECT * FROM(
        SELECT
            ...,
            row_number() OVER(ORDER BY ID) rn
        FROM
            ...
    ) WHERE rn between :start and :end
    

    Essentially, it's the ORDER BY part that's slowing things down. If I were to remove it, the EXPLAIN cost goes down by an order of magnitude (over 1000x). I've tried this:

    SELECT 
        ...
    FROM
        ...
    WHERE
        rownum between :start and :end
    

    But this doesn't give correct results. Is there any easy way to speed this up? Or will I have to spend some more time with the EXPLAIN tool?

  • Jason Baker
    Jason Baker about 15 years
    Actually, it wasn't. But changing it to a row that IS indexed isn't helping. Thanks for making the obvious suggestion though. :-)
  • Jason Baker
    Jason Baker about 15 years
    It's pagination. And that is essentially what the query does at least with the paging. I've just taken out the rest of the query (mainly because it's nontrivial). All the ellipses are where I've cut stuff out for brevity.
  • Jason Baker
    Jason Baker about 15 years
    Actually, that article helped me write the query. I didn't notice the part about ordering by unique ids though. Also there's a query optimizer hint that I missed. I'll try it out at work tomorrow!
  • Jason Baker
    Jason Baker about 15 years
    This sounds appropriate. I'm really pulling about 15,000 records out of ~2 million records. We're limited on the amount of time a query can take, and pulling in all 15k records at once was causing timeouts. Thus, I thought paging through the results would prevent this. I suppose this just means I'll have to go through the bureaucratic nightmare of requesting a longer timeout.
  • Dave Costa
    Dave Costa about 15 years
    An index would only help improve ORDER BY if the access path could use that index (i.e. you were looking up a range of IDs).
  • John Saunders
    John Saunders about 15 years
    I hope you're not sending 15,000 rows to the user!
  • David
    David about 15 years
    ;) thought it looked familiar. first_rows can be a amazing with pagination queries.
  • Jason Baker
    Jason Baker about 15 years
    Works like a charm. However, the optimizer hints didn't seem to make an appreciable difference.
  • Quassnoi
    Quassnoi about 15 years
    That means that CBO was smart enough to pick up the indexes. It actually was ROWNUM instead ROW_NUMBER that mattered here.
  • Jason Baker
    Jason Baker about 15 years
    FYI, I haven't tried it with both yet, but if I try that query with FIRST_ROWS instead of INDEX_ASC and NOPARALLEL_INDEX, the explain cost goes from ~25,000 to 8 and runs in pretty close to constant time (I can pull all the records as fast as one of them), so I might not even need to page anymore. I probably still will use paging in case of sudden spikes in record count.
  • Jason Baker
    Jason Baker about 15 years
    That and Quassnoi's advice got my query down to almost constant time! I wish I could select two answers. :-(
  • Frosty Z
    Frosty Z about 12 years
    +1 but I got a few difficulties to have it working. I had to use the last example (with two subqueries) of the mentioned blog post explainextended.com/2009/05/06/oracle-row_number-vs-rownum
  • Rafael Piccolo
    Rafael Piccolo almost 5 years
    This is wrong, ROWNUM can't be used together with ORDER BY, the ROWNUM sequence is generated before the ORDER BY.
  • Quassnoi
    Quassnoi almost 5 years
    @RafaelPiccolo: from the docs: If you embed the ORDER BY clause in a subquery and place the ROWNUM condition in the top-level query, then you can force the ROWNUM condition to be applied after the ordering of the rows. That's exactly what I'm doing here.
  • Rafael Piccolo
    Rafael Piccolo almost 5 years
    @Quassnoi not really, the outter select is just filtering the results. The inner select has both Rownum and Order By. To make this work, 3 levels of select are required: one to apply the order by, an outter select to apply the rownum (as mentioned in the docs) and yet another outter select to filter the rownum.