Performance of regexp_replace vs translate in Oracle?
Solution 1
I think you're running into simple optimization. The regexp expression is so expensive to compute that the result is cached in the hope that it will be used again in the future. If you actually use distinct strings to convert, you will see that the modest translate is naturally faster because it is its specialized function.
Here's my example, running on 11.1.0.7.0
:
SQL> DECLARE
2 TYPE t IS TABLE OF VARCHAR2(4000);
3 l t;
4 l_level NUMBER := 1000;
5 l_time TIMESTAMP;
6 l_char VARCHAR2(4000);
7 BEGIN
8 -- init
9 EXECUTE IMMEDIATE 'ALTER SESSION SET PLSQL_OPTIMIZE_LEVEL=2';
10 SELECT dbms_random.STRING('p', 2000)
11 BULK COLLECT
12 INTO l FROM dual
13 CONNECT BY LEVEL <= l_level;
14 -- regex
15 l_time := systimestamp;
16 FOR i IN 1 .. l.count LOOP
17 l_char := regexp_replace(l(i), '[]()[]', '-', 1, 0);
18 END LOOP;
19 dbms_output.put_line('regex :' || (systimestamp - l_time));
20 -- tranlate
21 l_time := systimestamp;
22 FOR i IN 1 .. l.count LOOP
23 l_char := translate(l(i), '()[]', '----');
24 END LOOP;
25 dbms_output.put_line('translate :' || (systimestamp - l_time));
26 END;
27 /
regex :+000000000 00:00:00.979305000
translate :+000000000 00:00:00.238773000
PL/SQL procedure successfully completed
on 11.2.0.3.0
:
regex :+000000000 00:00:00.617290000
translate :+000000000 00:00:00.138205000
Conclusion: In general I suspect translate
will win.
Solution 2
For SQL, I tested this with the following script:
set timing on
select sum(length(x)) from (
select translate('(<FIO>)', '()[]', '----') x
from (
select *
from dual
connect by level <= 2000000
)
);
select sum(length(x)) from (
select regexp_replace('[(<FIO>)]', '[\(\)\[]|\]', '-', 1, 0) x
from (
select *
from dual
connect by level <= 2000000
)
);
and found that the performance of translate
and regexp_replace
were almost always the same, but it could be that the cost of the other operations is overwhelming the cost of the functions I'm trying to test.
Next, I tried a PL/SQL version:
set timing on
declare
x varchar2(100);
begin
for i in 1..2500000 loop
x := translate('(<FIO>)', '()[]', '----');
end loop;
end;
/
declare
x varchar2(100);
begin
for i in 1..2500000 loop
x := regexp_replace('[(<FIO>)]', '[\(\)\[]|\]', '-', 1, 0);
end loop;
end;
/
Here the translate
version takes just under 10 seconds, while the regexp_replace
version around 0.2 seconds -- around 2 orders of magnitude faster(!)
Based on this result, I will be using regular expressions much more often in my performance critical code -- both SQL and PL/SQL.
Comments
-
Colin 't Hart almost 2 years
For simple things is it better to use the
translate
function on the premise that it is less CPU intensive or isregexp_replace
the way to go?This question comes forth from How can I replace brackets to hyphens within Oracle REGEXP_REPLACE function?
-
Vincent Malgrat about 11 yearsI think you're jumping to a conclusion a bit hastily here :) If you think about it, only cache optimization can explain this magnitude of difference in runtime. In a real world example you wouldn't convert the same string over and over I suspect.
-
Vincent Malgrat about 11 yearsStill, it's interesting to see that in some cases
regexp
is faster thantranslate
:) -
APC about 11 yearsOn 10g I find that REGEXP_ is reliably one or two orders of magnitude slower than their non-REGEXP analogues (when doing enough of something to measure the difference). However, 10g was the first release with built-in regex functions and I would have expected Oracle to have doen some significant tuning for 11g.