Performance of SUBSTR on CLOB

21,788

Solution 1

(Lies, damn lies, and benchmarks...)

I re-ran your test 10 times, expanding the string so it was a full 30 characters long, and got the following averaged results:

+000000000 00:00:00.011694200 (VARCHAR2)
+000000000 00:00:00.901000600 (CLOB SUBSTR)
+000000000 00:00:00.013169200 (DBMS_LOB.SUBSTR)

I then changed the substring range to 5,14 (14,5 for DBMS_LOB.SUBSTR) and got:

+000000000 00:00:00.011731000 (VARCHAR2)
+000000000 00:00:01.010840000 (CLOB SUBSTR)
+000000000 00:00:00.011427000 (DBMS_LOB.SUBSTR)

I then changed the range to 17,14 (14,17 for DBMS_LOB.SUBSTR) and got

+000000000 00:00:00.013578900 (VARCHAR2)
+000000000 00:00:00.964527400 (CLOB SUBSTR)
+000000000 00:00:00.011416800 (DBMS_LOB.SUBSTR)

Finally, I changed the range to 25,14 (14,25 for DBMS_LOB.SUBSTR) and got

+000000000 00:00:00.011210200 (VARCHAR2)
+000000000 00:00:00.916439800 (CLOB SUBSTR)
+000000000 00:00:00.013781300 (DBMS_LOB.SUBSTR)

My conclusion is that when working against CLOB's it's best to use DBMS_LOB.SUBSTR as it appears to have effectively no performance penalty compared to using SUBSTR against a "normal" VARCHAR2. SUBSTR against a CLOB seems to suffer from a significant performance penalty. For the record - OS = HP/UX (Unix variant), Oracle version=11.1, processor=HP Itanium 2-plex. YMMV.

Share and enjoy.


And because if it's worth doing it's worth over-doing, here's some more results with the strings expanded to 32767 characters. Substring ranges given with each set of results:

1, 25000
+000000000 00:00:00.198466400 (VARCHAR2)
+000000000 00:00:02.870958700 (CLOB SUBSTR)
+000000000 00:00:00.174490100 (DBMS_LOB.SUBSTR)

1000, 25000
+000000000 00:00:00.253447900 (VARCHAR2)
+000000000 00:00:02.491790500 (CLOB SUBSTR)
+000000000 00:00:00.193560100 (DBMS_LOB.SUBSTR)

10000, 25000
+000000000 00:00:00.217812000 (VARCHAR2)
+000000000 00:00:02.268794800 (CLOB SUBSTR)
+000000000 00:00:00.222200200 (DBMS_LOB.SUBSTR)

Same day, same conclusion.

Cthulhu fhtagn.


(Once more unto the breach, dear friends, once more...)

Re-ran the benchmarks, changing the size of the CLOB to 3276700, and taking the substring from the middle starting at 2475000 for length 25000 I get:

+000000000 00:00:00.176883200 (VARCHAR2)
+000000000 00:00:02.069482600 (CLOB SUBSTR)
+000000000 00:00:00.175341500 (DBMS_LOB.SUBSTR)

(Note that changes only affect the last two tests).

AND...same results, different day.

YMMV.

Solution 2

I know this is very old, but may still be relevant to people on older systems. This looks like a data type conversion problem. Based on something I noticed looking at the effect @bernhard.weingartner saw, the data type of the offset and amount arguments seems to make a huge difference.

This is run on 11.2.0.3 on Linux (OEL 5.6), and increased to a million iterations just to make the differences even more obvious:

DECLARE
  l_text   VARCHAR2(30) := 'This is a test record';
  l_clob   CLOB := l_text;
  l_substr VARCHAR2(30);
  t TIMESTAMP;
BEGIN
  t := SYSTIMESTAMP;
  FOR i IN 1..1000000 LOOP
    l_substr := SUBSTR(l_text,1,14);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (VARCHAR2)');

  t := SYSTIMESTAMP;
  FOR i IN 1..1000000 LOOP
    l_substr := SUBSTR(l_clob,1,14);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (CLOB SUBSTR)');

  t := SYSTIMESTAMP;
  FOR i IN 1..1000000 LOOP
    l_substr := DBMS_LOB.SUBSTR(l_clob,14,1);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (DBMS_LOB.SUBSTR with 14,1)');

  t := SYSTIMESTAMP;
  FOR i IN 1..1000000 LOOP
    l_substr := DBMS_LOB.SUBSTR(l_clob,14.0,1.0);
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (DBMS_LOB.SUBSTR with 14.0,1.0)');

  t := SYSTIMESTAMP;
  FOR i IN 1..1000000 LOOP
    l_substr := DBMS_LOB.SUBSTR(l_clob,cast(14 as number), cast(1 as number));
  END LOOP;
  dbms_output.put_line( SYSTIMESTAMP - t || ' (DBMS_LOB.SUBSTR with casts)');
END;
/
+000000000 00:00:00.043019000 (VARCHAR2)
+000000000 00:00:03.671621000 (CLOB SUBSTR)
+000000000 00:00:05.704337000 (DBMS_LOB.SUBSTR with 14,1)
+000000000 00:00:00.040097000 (DBMS_LOB.SUBSTR with 14.0,1.0)
+000000000 00:00:00.040907000 (DBMS_LOB.SUBSTR with casts)

The 11gR2 docs show the formal parameters as type INTEGER, but actually passing an integer (or pls_integer, or binary_double) is slow, while explicitly passing a number is quick.

From your original question and Bob's results this looks like something that changed between 11.1 and 11.2. I don't have a 12c instance to test on so don't know if it's changed again. Whether it's due to a change in dbms_lob or a wider change to how PL/SQL handles numeric values by default isn't clear. I haven't found anything on MOS that looks relevant.

Solution 3

Ran your script three times on the following system:

Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production

Here are the results:

+000000000 00:00:00.007787000 (VARCHAR2)
+000000000 00:00:03.093258000 (CLOB SUBSTR)
+000000000 00:00:00.340017000 (DBMS_LOB.SUBSTR)

+000000000 00:00:00.019460000 (VARCHAR2)
+000000000 00:00:03.302425000 (CLOB SUBSTR)
+000000000 00:00:00.336915000 (DBMS_LOB.SUBSTR)

+000000000 00:00:00.007773000 (VARCHAR2)
+000000000 00:00:03.210619000 (CLOB SUBSTR)
+000000000 00:00:00.336689000 (DBMS_LOB.SUBSTR)

Solution 4

I see that on 11gR1 tests ran smooth for DBMS_LOB.substr, but for 11gR2 the function is slow.

Below my test on Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production on AIX6.

+000000000 00:00:00.009440000 (VARCHAR2)
+000000000 00:00:00.749113000 (CLOB SUBSTR)
+000000000 00:00:01.177685000 (DBMS_LOB.SUBSTR)
Share:
21,788

Related videos on Youtube

Peter Lang
Author by

Peter Lang

#SOreadytohelp

Updated on August 06, 2020

Comments

  • Peter Lang
    Peter Lang almost 4 years

    I have a PL/SQL procedure that does a lot of SUBSTRs on a VARCHAR2 parameter. I would like to remove the length limit, so I tried to change it to CLOB.

    Works fine, but performance suffers, so I did some tests (based on these tests from 2005).


    UPDATE: I can reproduce this on several different instances with different Oracle versions and different hardware, dbms_lob.substr is always noticeable slower than substr(CLOB), and a lot slower than SUBSTR(VARCHAR2).

    Bob's results and the tests in the link above tell a different story.

    Can anyone explain this, or at least reproduce either Bob's or my results? Thanks!


    Test results:

    +000000000 00:00:00.004000000 (VARCHAR2)
    +000000000 00:00:00.298000000 (CLOB SUBSTR)
    +000000000 00:00:00.356000000 (DBMS_LOB.SUBSTR)

    Test code:

    DECLARE
      l_text   VARCHAR2(30) := 'This is a test record';
      l_clob   CLOB := l_text;
      l_substr VARCHAR2(30);
      t TIMESTAMP;
    BEGIN
      t := SYSTIMESTAMP;
      FOR i IN 1..100000 LOOP
        l_substr := SUBSTR(l_text,1,14);
      END LOOP;
      dbms_output.put_line( SYSTIMESTAMP - t || ' (VARCHAR2)');
    
      t := SYSTIMESTAMP;
      FOR i IN 1..100000 LOOP
        l_substr := SUBSTR(l_clob,1,14);
      END LOOP;
      dbms_output.put_line( SYSTIMESTAMP - t || ' (CLOB SUBSTR)');
    
      t := SYSTIMESTAMP;
      FOR i IN 1..100000 LOOP
        l_substr := DBMS_LOB.SUBSTR(l_clob,14,1);
      END LOOP;
      dbms_output.put_line( SYSTIMESTAMP - t || ' (DBMS_LOB.SUBSTR)');
    END;
    
    • MatBailie
      MatBailie about 12 years
      Note: Test three is 14,1 where the others are 1,14. I'd also test something like 10000, 5000 as the point is that you're looking to break the 4k limit of VARCHAR. Also, as the results are about 75x slower for non VARCHAR, are you able to look at an algorithm that deals with a multiple VARCHARs? [Such as a normalised table where one field is the 'sequence_id' showing the relative position of this string, and the other is the VARCHAR]. Finally, although there is large relative difference, the absolute difference is low. So, does it matter? [Pre-optimisation]
    • Peter Lang
      Peter Lang about 12 years
      @Dems: Thanks for your input! The 14,1 and 1,14 is correct (thanks Oracle for the consistent APIs). I'm trying to break the 32767 Byte limit (PL/SQL, not SQL), and results are more or less the same when using text with that length (LPAD('X', 32767, 'X')). I have thought of that multiple-varchar-table solution, but I'd like to avoid it :) And it does matter, since the procedure is called really often, but most of all I'm curious if there are alternatives...
    • Florin Ghita
      Florin Ghita about 12 years
      On my machines DBMS_LOB.SUBSTR is a little slower than CLOB_SUBSTR(20%). And both mooore slower than varchar2(70 times slower). I run on 11gR2.
    • Peter Lang
      Peter Lang almost 12 years
      @FlorinGhita: Thanks! Which OS are you using, in case this makes a difference?
    • Florin Ghita
      Florin Ghita almost 12 years
      AIX 6(don't know the minor...2 or 3). It's a 128 cores IBM machine. what OS and machine do you have?
  • Peter Lang
    Peter Lang about 12 years
    Thanks! I tried your last test (extending l_text to 50 characters, since 30 makes no sense any more), but the results were still comparable to my original test (after calling it several times): 0.006, 0.679, 1.064. Which version of Oracle are you using? Can you reproduce your results when calling several times?
  • Bob Jarvis - Слава Україні
    Bob Jarvis - Слава Україні about 12 years
    Oracle 11.1. And yes, the tests results are repeatable. I changed the loops to iterate 1 million times each, then divided the resultant time interval by 10 to make them roughly comparable to the original tests. I re-ran the tests multiple times and each result was comparable.
  • Peter Lang
    Peter Lang about 12 years
    That's weird. I tried my tests on three different instances (11.2 on Windows, 11.2 on Linux and 10.2 on Linux), and results are more or less the same as in my post. Would be great to have some more results from other people...
  • Bob Jarvis - Слава Україні
    Bob Jarvis - Слава Україні about 12 years
    What processor architecture? How much L1 and L2 cache? How many victims^H^H^H^H^H^H^H offerings sacrificed to the Elder Gods? Little things mean a lot... :->
  • Jon Heller
    Jon Heller about 12 years
    "Itanium... YMMV" - your money may vanish?
  • Peter Lang
    Peter Lang about 12 years
    I doubt that this is only hardware-related. As I said in my first comment, my first two tests are faster than yours, but the third one is significantly slower...
  • Bob Jarvis - Слава Україні
    Bob Jarvis - Слава Україні about 12 years
    @PeterLang - added tests for large (3 meg) CLOB. Same results.
  • Peter Lang
    Peter Lang about 12 years
    Thanks again. Your dbms_lob.substr seems to work different than mine. I'd love to know why...
  • Bob Jarvis - Слава Україні
    Bob Jarvis - Слава Україні about 12 years
    @PeterLang - I did note that in the OCI docs there are two versions of OCILobCopy, those being OCILobCopy and OCILobCopy2, with the rule being to use the "2" version of any OCI function if such a function exists because the "2" versions are better/faster/improved/more capable. Perhaps "commodity OS" (e.g. Windows, Linux, etc) Oracle uses one version while the "commercial OS" versions (e.g. those for z/OS, HP-UX, etc) use a different version - or one uses C code while the other uses hand-optimized assembler. ??? YMMV
  • Peter Lang
    Peter Lang almost 12 years
    Thanks! Would you mind to share which OS you use, in case this makes a difference?
  • Peter Lang
    Peter Lang almost 12 years
    @BobJarvis: I would be surprised if this would make such a big difference, but who knows... Do you have a reference for these findings? I only see that OCILobCopy is deprecated, and that OCILobCopy2 needs to be used for LOBs of size greater than 4 GB.
  • Bob Jarvis - Слава Україні
    Bob Jarvis - Слава Україні almost 12 years
    My comments on possible differences between OCILobCopy and OCILobCopy2 were just suppositions on my part, not facts. I see that @tp9 posted some results that are roughly the same as what I'd found. He also used 11.1. I wish I had an 11.2 DB to test on.
  • tp9
    tp9 almost 12 years
    @PeterLang Sure thing. Redhat Enterprise Linux: 2.6.9-67.ELsmp
  • Peter Lang
    Peter Lang almost 12 years
    And I wish I had an 11.1 to test :) Well, we still don't know why this happens, but your effort definitely deserves the bounty.