Optimising a SELECT query that runs slow on Oracle which runs quickly on SQL Server
Solution 1
Often this type of problem goes away if you analyze the tables involved (so Oracle has a better idea of the distribution of the data)
ANALYZE TABLE tasks COMPUTE STATISTICS;
Solution 2
The "IN" - clause is known in Oracle to be pretty slow. In fact, the internal query optimizer in Oracle cannot handle statements with "IN" pretty good. try using "EXISTS":
SELECT orderID FROM tasks WHERE orderID NOT EXISTS
(SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL)`print("code sample");`
Caution: Please check if the query builds the same data results.
Edith says: ooops, the query is not well formed, but the general idea is correct. Oracle has to fulfill a full table scan for the second (inner) query, build the results and then compare them to the first (outer) query, that's why it's slowing down. Try
SELECT orderID AS oid FROM tasks WHERE NOT EXISTS
(SELECT DISTINCT orderID AS oid2 FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL and oid=oid2)
or something similiar ;-)
Solution 3
I would try using joins instead
SELECT
t.orderID
FROM
tasks t
LEFT JOIN tasks t1
ON t.orderID = t1.orderID
AND t1.engineer1 IS NOT NULL
AND t1.engineer2 IS NOT NULL
WHERE
t1.orderID IS NULL
also your original query would probably be easier to understand if it was specified as:
SELECT orderID FROM orders WHERE orderID NOT IN
(SELECT DISTINCT orderID FROM tasks WHERE
engineer1 IS NOT NULL AND engineer2 IS NOT NULL)
(assuming you have orders table with all the orders listed)
which can be then rewritten using joins as:
SELECT
o.orderID
FROM
orders o
LEFT JOIN tasks t
ON o.orderID = t.orderID
AND t.engineer1 IS NOT NULL
AND t.engineer2 IS NOT NULL
WHERE
t.orderID IS NULL
Solution 4
I agree with TZQTZIO, I don't get your query.
If we assume the query did make sense then you might want to try using EXISTS as some suggest and avoid IN. IN is not always bad and there are likely cases which one could show it actually performs better than EXISTS.
The question title is not very helpful. I could set this query up in one Oracle database and make it run slow and make it run fast in another. There are many factors that determine how the database resolves the query, object statistics, SYS schema statistics, and parameters, as well as server performance. Sqlserver vs. Oracle isn't the problem here.
For those interested in query tuning and performance and want to learn more some of the google terms to search are "oak table oracle" and "oracle jonathan lewis".
Solution 5
Some questions:
- How many rows are there in tasks?
- What indexes are defined on it?
- Has the table been analyzed recently?
Another way to write the same query would be:
select orderid from tasks
minus
select orderid from tasks
where engineer1 IS NOT NULL AND engineer2 IS NOT NULL
However, I would rather expect the query to involve an "orders" table:
select orderid from ORDERS
minus
select orderid from tasks
where engineer1 IS NOT NULL AND engineer2 IS NOT NULL
or
select orderid from ORDERS
where orderid not in
( select orderid from tasks
where engineer1 IS NOT NULL AND engineer2 IS NOT NULL
)
or
select orderid from ORDERS
where not exists
( select null from tasks
where tasks.orderid = orders.orderid
and engineer1 IS NOT NULL OR engineer2 IS NOT NULL
)
RickL
Updated on July 05, 2022Comments
-
RickL almost 2 years
I'm trying to run the following SQL statement in Oracle, and it takes ages to run:
SELECT orderID FROM tasks WHERE orderID NOT IN (SELECT DISTINCT orderID FROM tasks WHERE engineer1 IS NOT NULL AND engineer2 IS NOT NULL)
If I run just the sub-part that is in the IN clause, that runs very quickly in Oracle, i.e.
SELECT DISTINCT orderID FROM tasks WHERE engineer1 IS NOT NULL AND engineer2 IS NOT NULL
Why does the whole statement take such a long time in Oracle? In SQL Server the whole statement runs quickly.
Alternatively is there a simpler/different/better SQL statement I should use?
Some more details about the problem:
- Each order is made of many tasks
- Each order will be allocated (one or more of its task will have engineer1 and engineer2 set) or the order can be unallocated (all its task have null values for the engineer fields)
- I am trying to find all the orderIDs that are unallocated.
Just in case it makes any difference, there are ~120k rows in the table, and 3 tasks per order, so ~40k different orders.
Responses to answers:
- I would prefer a SQL statement that works in both SQL Server and Oracle.
- The tasks only has an index on the orderID and taskID.
- I tried the NOT EXISTS version of the statement but it ran for over 3 minutes before I cancelled it. Perhaps need a JOIN version of the statement?
- There is an "orders" table as well with the orderID column. But I was trying to simplify the question by not including it in the original SQL statement.
I guess that in the original SQL statement the sub-query is run every time for each row in the first part of the SQL statement - even though it is static and should only need to be run once?
Executing
ANALYZE TABLE tasks COMPUTE STATISTICS;
made my original SQL statement execute much faster.
Although I'm still curious why I have to do this, and if/when I would need to run it again?
The statistics give Oracle's cost-based optimzer information that it needs to determine the efficiency of different execution plans: for example, the number of rowsin a table, the average width of rows, highest and lowest values per column, number of distinct values per column, clustering factor of indexes etc.
In a small database you can just setup a job to gather statistics every night and leave it alone. In fact, this is the default under 10g. For larger implementations you usually have to weigh the stability of the execution plans against the way that the data changes, which is a tricky balance.
Oracle also has a feature called "dynamic sampling" that is used to sample tables to determine relevant statistics at execution time. It's much more often used with data warehouses where the overhead of the sampling it outweighed by the potential performance increase for a long-running query.
-
Maurício over 15 yearsNo it's not. I made the same mistake :-) Each order has mutliple tasks, and if one of those tasks has an engineer assigned, the order counts as "allocated"
-
Mac over 15 yearsI came to the same query (see my answer below), except for : * the subquery has no reason to select DISTINCT orderIDs. * drop the "orderID" between WHERE and NOT EXISTS (syntax error). * drop the 'print("code sample"), obviously ;-)
-
Georgi over 15 yearsI thought of that, too, but I think this query is not the fastest, too.
-
RickL over 15 yearsI tried this but it was still running over a minute later when I cancelled it.
-
RickL over 15 yearsWhen I try the second query it gets an error? ORA-00904: "OID2": invalid identifier
-
Georgi over 15 yearsUse the "AS oid2" I inserted, it was not in the query before.
-
Georgi over 15 years@Mac: Oooops, 'print("code sample")', hmmm... my only excuse is that I'm working on a new laptop and didn't see it :) - thanks for your hints!
-
Dave Costa over 15 yearsYour second proposal is NOT the same thing, since there can be multiple tasks per orderID, some assigned and some not. He only wants orderIDs for which NONE of the tasks are assigned.
-
Dave Costa over 15 yearsYour last proposal would need to have a correlated subquery (e.g. add "AND tasks.orderID = orders.orderID"). You can't simply change NOT IN to NOT EXISTS without modifying the subquery.
-
RickL over 15 yearsHmm, no, it still gives the same error... tried in SQL Plus and Toad Freeware.
-
Peter Dutton over 15 yearsYes, be careful, the minus statement uses alot of memory
-
Georgi over 15 yearsUh, I don't have a Oracle on my laptop at the moment, but give this a try: Completely skip the "AS oid2" and end the query with "oid=orderID)" - if you cant fix it, eliminate it ;-)
-
David Aldridge over 15 years"The "IN" - clause is known in Oracle to be pretty slow. In fact, the internal query optimizer in Oracle cannot handle statements with "IN" pretty good. try using "EXISTS":" Nonense. Where's the justification for this?
-
RickL over 15 yearsThanks, I tried this and it is the correct syntax, but it was still running for over 3 minutes when I cancelled it.
-
RickL over 15 yearsThis doesn't work because it should only return orders where all tasks engineer fields are not null, but this will return orders where some tasks are engineer noll, and some tasks are engineer not null. Did you mean for the WHERE clause to be different?
-
RickL over 15 yearsAJ gave the correct syntax for the NOT EXISTS version of the SQL statement, but it is still too slow (over 3 minutes).
-
kristof over 15 yearsIt think it returns the same result as the query that you specified: SELECT orderID FROM tasks WHERE orderID NOT IN (SELECT DISTINCT orderID FROM tasks WHERE engineer1 IS NOT NULL AND engineer2 IS NOT NULL) which I believe returns:All orders that do not have any tasks with both engineers assigned
-
kristof over 15 yearsBut perhaps I am missing something, if one assigned engineer is enough not to list the order then change the left join to read as: LEFT JOIN tasks t1 ON t.orderID = t1.orderID AND (t1.engineer1 IS NOT NULL OR t1.engineer2 IS NOT NULL) But that would be different from your original query.
-
David Aldridge over 15 yearsNothing is necessary "bad". The difference between the benefits of different techniques almost always comes down to the value distributions of a particular data set and the presence or absence of indexes and constraints.
-
David Aldridge over 15 yearsThis is an obsolete syntax for collecting statistics. DBMS_STATS is a more robust way. download.oracle.com/docs/cd/B19306_01/server.102/b14211/…
-
Ethan Post over 15 yearsI agree, this is obsolete. Please try to avoid using it. Use DBMS_STATS and make sure you also get your indexes, you can set cascade=>true when you gather stats for the table.
-
David Aldridge over 15 yearsNormally the implicit distinct would be unecessary overhead, but in this case it would probably be appropriate. Two scans of the table might be inefficient though.
-
Ishbir over 15 yearsTypically, the engineer fields will be foreign keys, so the indexes should be there.
-
Jim Birchall over 15 years@ΤΖΩΤΖΙΟΥ Thanks for the downvote, but did you read the question? "The tasks only has an index on the orderID and taskID"
-
Admin over 15 yearsDave's on a Dynamic Sampling Binge
-
Admin over 15 yearsGeorgi, where is your proof that IN is "slow" and the CBO can't handle it? If you're going to answer questions here, try to do so from a position of understanding, instead of one from hearsay and supposition
-
David Aldridge over 15 yearsI loves the dynamic sampling. I don't know it isn't spoken of more often.
-
E_the_Shy over 14 years@Dave, only if your queries are big and infrequent. Otherwise you spend more time sampling that doing.
-
David Aldridge over 14 yearsWell it all bears measuring of course. DS impact depends on a lot of issues, and you can get information from a DS query that lasts a tiny fraction of a second that can take a query from 10 seconds down to a fraction of a second. For example dynamic sampling of a part of a query in which multiple predicates run against a small dimension table (public holidays on a Monday in a particular year? Women over 5'11''?) need not take long but can be very valuable. Long story short, it's just another tool to understand and appreciate for what it is.
-
usr over 14 yearsSQL Server allows only one NULL row in an index except you create a filtered index. In this case it does not include NULLs.
-
orbfish over 13 yearsActually, IN is not so bad, it sometimes runs faster than EXISTS (probably due to caching, I've never tracked this down). NOT IN (or NOT EXISTS), however, can be notoriously slow.
-
Chad almost 11 yearsNote, I tried Analyze today and it didn't do anything. DO NOT USE IT. Instead use DBMS_STATS instead. A query went from 5 minutes to 0.3 seconds with DBMS_STATS