Oracle 11g - How to optimize slow parallel insert select?
Solution 1
Try using more bind variables, especially where nested loops might happen. I've noticed that you can use it in cases like
CREATE_DT >= :YOUR_DATE instead of CREATE_DT >= sysdate - 60
I think this would explain why you have 180 million executions in the lowest part of your execution plan even though the whole other part of the update query is still at 8 million out of your 79 million.
Solution 2
Improve statistics. The estimated number of rows is 1, but the actual number of rows is over 7 million and counting. This causes the execution plan to use a nested loop instead of a hash join. A nested loop works better for small amounts of data and a hash join works better for large amounts of data. Fixing that may be as easy as ensuring the relevant tables have accurate, current statistics. This can usually be done by gathering statistics with the default settings, for example: exec dbms_stats.gather_table_stats('SIRS_UATC1', 'TBL_RECON_PM');
.
If that doesn't improve the cardinality estimate try using a dynamic sampling hint, such as /*+ dynamic_sampling(5) */
. For such a long-running query it is worth spending a little extra time up-front sampling data if it leads to a better plan.
Use statement-level parallelism instead of object-level parallelism. This is probably the most common mistake with parallel SQL. If you use object-level parallelism the hint must reference the alias of the object. Since 11gR2 there is no need to worry about specifying objects. This statement only needs a single hint: INSERT /*+ PARALLEL(16) APPEND */ ...
. Note that NOLOGGING
is not a real hint.
robertdi
Updated on July 09, 2022Comments
-
robertdi over 1 year
we want to speed up the run of the parallel insert statement below. We are expecting to insert around 80M records and it is taking around 2 hours to finish.
INSERT /*+ PARALLEL(STAGING_EX,16) APPEND NOLOGGING */ INTO STAGING_EX (ID, TRAN_DT, RECON_DT_START, RECON_DT_END, RECON_CONFIG_ID, RECON_PM_ID) SELECT /*+PARALLEL(PM,16) */ SEQ_RESULT_ID.nextval, sysdate, sysdate, sysdate, '8a038312403e859201405245eed00c42', T1.ID FROM PM T1 WHERE STATUS = 1 and not exists(select 1 from RESULT where T1.ID = RECON_PM_ID and CREATE_DT >= sysdate - 60) and UPLOAD_DT >= sysdate - 1 and (FUND_SRC_TYPE = :1)
We think that caching the results of the not exist column will speed up the inserts. How do we perform the caching? Any ideas how else to speed up the insert?
Please see below for plan statistics from Enterprise Manager. Also we noticed that the statements are not being run in parallel. Is this normal?
Edit: btw, the sequence is already cached to 1M
-
robertdi almost 10 yearsthanks. after doing this, execution was down to just 11 minutes!
-
Jon Heller almost 10 yearsI don't understand why adding a bind variable would improve this query. In general bind variables provide less information to the optimizer, and lead to worse estimates. I have a feeling there's something else going on here. It would be interesting to see the
dbms_xplan.display_cursor
for both versions of the query. -
grassbl8d almost 10 yearsamazing, worked for me as well. Replaced everything with bind variables. thanks
-
AnBisw almost 9 yearsDoes Statement level prallelism work on insert with column list like
INSERT /*+ parallel(16) */ tab (col1,col2) SELECT a1,b2 from tab2;
?? It doesn't seem to be working for me. On the other hand when I do parallelize theSELECT
as well, it does work. -
Jon Heller almost 9 years@Annjawn I've seen it work many times. Here's an example, although it's not directly runnable because SQL Fiddle doesn't have enough features: sqlfiddle.com/#!4/be5ddf/2
-
AnBisw almost 9 yearsInteresting. I am not certain as to why it isn't working for me then. All other Parallel DMLs are working just fine except this when there is a column list.
-
Jon Heller almost 9 years@Annjawn Even with the same example in my SQL Fiddle? That might make an interesting question.