Competitors:
popular REGEXP_REPLACE function
and the good old TRANSLATE .
Track - to remove all numeric values from the string.
3 rounds at distances:
1000 records
10000 record0s
100000 records
Who will win the race? Which function will replace numbers faster? (source code at the end of the post).
Race No. 1
Round 1
Distance: 1000 records
The first one is TRANSLATE, but the good old REGEXP_REPLACE is literally right after it.
Time:
translate 1000: 0.9 sec
regexp_replace: 1000: 0,25 sec
Round 2
Distance: 10,000 records
The increase in the number of records did not affect the results of the race too much, again both functions hit the finish line head to head, although this time TRANSLATE was second faster.
Time:
translate10000: 1 sec
regexp_replace 10000: 2 sec
Round 3
Distance: 1,000,000 records
Increasing the distance to 1,000,000 did not reveal the final winner, again the results were very similar.
translate 100000: 19 sec
regexp_replace 100000: 23 sec
The race is tied.
Are both functions in this task equally fast?
I have decided to restart the race, this time changing the external conditions a bit, from sunny weather to rain ...
Race number 2
Round 1
Distance: 1000 records
TRANSLATE is the first to finish, slightly ahead of REGEXP.
translate 100: 0.2 sec
regexp_replace 100; 4,7 sec
Round 2
Distance: 10,000 records
TRANSLATE definitely wins this round, REGEXP, out of breath, runs far behind his rival.
translate 10000: 1 sec
regexp_replace 10000: 45 sec
Round 3
Distance: 100,000 records
TRANSLATE does not give the rival a chance! By the time REGEXP hits the finish line, TRANSLATE has had a shower and a break with a colorful drink!
translate 100000: 19 sec
regexp_replace 100000: 454 sec
The definite winner of the rainy round: TRANSLATE!!!
Let's take another look at the results:
In Race 2, the REGEXP_REPLACE function time has twenty times!
Who was betting that?
But what has changed? Why did TRANSLATE and REGEXP_REPLACE go head-to-head in the first test, and why did TRANSLATE leave his rival far behind in the second test? What caused REGEXP_REPLECE to slow down so drastically?
Time for a solution: in Race 2, I only changed the NLS_LANGUAGE settings of the session.
The first test, where both functions had similar times, was performed with NLS_LANGUAGE = ENGLISH, and the second, where TRANSLATE definitely won, was performed with NLS_LANGUAGE = POLISH. As you can see, the performance of REGEXP depends very much on the language settings!
If you are using REGEXP in your system - pay attention to the NLS_LANGUAGE settings - perhaps changing these settings or switching from REGEXP to the standard TRANSLACE function s will save you precious minutes!
Source code:
Tests run on Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production.
set serveroutpu on
DECLARE
TYPE tab IS TABLE OF VARCHAR2(4000);
t tab;
cnLoops constant NUMBER := 100;
nLoops number := 0;
vLang varchar2(20);
vTestLang varchar2(20);
vChar varchar2(4000);
cursor cur is
SELECT dbms_random.string('x',2000)
FROM dual
CONNECT BY level <= nLoops;
cnt number := 0;
nTimeStart number := dbms_utility.get_time;
nTimeEnd number;
BEGIN
for ncnt in 1..3 loop
nLoops := case ncnt when 1 then 1000
when 2 then 10000
when 3 then 100000
end ;
nTimeStart := dbms_utility.get_time;
open cur;
fetch cur bulk collect into t;
nTimeEnd := dbms_utility.get_time;
DBMS_OUTPUT.PUT_LINE('Fetch '|| nLoops|| ': '|| trunc( (nTimeEnd - nTimeStart)/100,2) );
for vTestLang in 1..2 loop
if vTestLang = 2 then
vLang := 'POLISH';
else
vLang := 'ENGLISH';
end if;
execute immediate 'ALTER SESSION SET NLS_LANGUAGE = '||vLang;
cnt := 0;
nTimeStart := dbms_utility.get_time;
FOR i IN 1..t.count LOOP
vchar := translate(t(i),' 0987654321',' ');
cnt := cnt + 1;
END LOOP;
nTimeEnd := dbms_utility.get_time;
DBMS_OUTPUT.PUT_LINE('Translate '|| cnt ||';' || vLang || ';'|| nLoops|| ': '|| trunc( (nTimeEnd - nTimeStart)/100,2)) ;
cnt := 0;
nTimeStart := dbms_utility.get_time;
FOR i IN 1..t.count LOOP
vchar := regexp_replace(t(i),'\d','');
cnt := cnt + 1;
END LOOP;
nTimeEnd := dbms_utility.get_time;
DBMS_OUTPUT.PUT_LINE('regexp '|| cnt ||';' || vLang || ';'|| nLoops|| ': '|| trunc((nTimeEnd - nTimeStart)/100,2));
end loop;
close cur;
end loop;
exception when others then
dbms_output.put_line(sqlerrm);
END;
/
Languages and regular expressions definitely can create some nasty things. Some additional info here
OdpowiedzUsuńhttps://connor-mcdonald.com/2018/08/22/take-care-with-regular-expressions/
Wow, I have never thought to use regular expression in constraints :o Sounds weird to me :D But good to know that it should not be doing anyway :D
Usuń