Optimize Oracle order by query

12,162

Solution 1

You're sorting five million rows including about one hundred thousand clobs. It is unlikely you'll be able to do that in memory; if you run the query in SQL*Plus with stats on you should see the query writing out to disk a lot.

One way of improving the performance will be to add more GB to your buffer cache, but that may be an expensive option and not a quick one either.

Building a composite index on erp_in(indate desc, corr_id, interface) could help as well, because the results from the driving table will be pre-sorted and so the ordering should take less work. Partitioning by INDATE might have a similar effect, but partitioning is a chargeable extra to the Enterprise Edition and hence is not a cheap fix (gobs more memory will probably be a lot cheaper).

Your reference to archiving old data suggests that you don't actually want to retrieve all the records. If this is the case then reducing the size of the result set with a WHERE clause would help a lot. The easiest way to tune something is not to do work in the first place.

Adding a primary key and ordering by it won't reduce the amount of effort required to to the actual sort.


"so should I partition by date? will it help without adding WHERE clause on INDATE field"

It depends. Partitioning introduces some physical organization of the table, so the rows will (at least) require less sorting. How much less depends on the granularity of the partition: partition by a range on one day and the table is pretty much already in INDATE order, partition by a range of one year and it's much less so.

However, bear in mind that partitioning is not primarily a performance option. It is an option for managing data, especially loading, and availability. In fact, it may downgrade the performance of some queries, queries which don't suit the ordering applied by the partition key.

So, should you partition by date? Not a question we can answer. Answering it requires the sort of in-depth knowledge of your system which you have and we lack. But if you have the licenses it is certainly an option you should investigate and benchmark.

Solution 2

Are you really fetching so many rows to web server? If yes, review your code to narrow down to what's required.

  1. Try archiving old data to another table based on datetime. Re-write logic to fetch older data only when required.
  2. As others have mentioned, indexes / keys should help a mile in most cases

If you can't do any of the above, another ugly solution (not sure if it will not go worse) is to create in-memory table, filter and fetch what's required and then fetch CLOB data.

Share:
12,162
Amir
Author by

Amir

Updated on August 08, 2022

Comments

  • Amir
    Amir over 1 year

    I have a table with around 5 million rows that look like this:

    Erp_in:

    corr_id varchar(50) (almost Unique)
    corr_type nvarchar(1) (4 distinct values)
    interface varchar(20) (around 10 distinct values)
    indate DateTime
    

    With 3 different index on (corr_id, interface and indate)
    And also I have another table which I usally left join with the original table, with about 100000 rows

    Erp_In_failed:

    corr_id
    interface
    error (clob)   
    input (clob)
    

    with index on (corr_id and interface)

    The query I want to optimize is Simple as this:

    SELECT a.corr_id, a.interface, a.indate, b.error 
    FROM erp_in a left join erp_in_failed b on a.corr_id = b.corr_id and a.interface =          b.interface
    Order by a.indate desc;
    

    If I remove the order by, the query isn't taking that long, but ordering the data takes around 3 minutes if not more.

    What can I do to optimize the query? I was thinking on Partitioning / Removing old data to history table / maybe creating a sequence primary key and order by it or anything else you have in mind...

    Edit:
    Execution plan says full table scan, and it's not the join that takes so long it's the order by.
    Even this query takes forever:

    SELECT * FROM erp_in
    ORDER BY indate;
    

    I have tried using Paging, but that doesn't work also, and takes few minutes for 20 results, maybe I am doing it wrong?

    If I add WHERE clause on indate field, it uses the index, but only when it is less than 20 days old, anything other than that still uses full table scan. (even with 40 days, adding INDEX hint made the query run faster, but still not enough).

    And just for the curiosity, I have a simple table with 1 million rows, and order by takes few seconds, what's the difference? is 1 million is enough to sort it in RAM?

    Thanks,