OraGP: translate vs. regexp_replace


Competitors:

popular REGEXP_REPLACE function
and the good old TRANSLATE .

Track - to remove all numeric values ​​from the string.
3 rounds at distances:
          1000 records
        10000 record0s
      100000 records

Who will win the race? Which function will replace numbers faster? (source code at the end of the post).

Race No. 1
Round 1
Distance: 1000 records

The first one is TRANSLATE, but the good old REGEXP_REPLACE is literally right after it.

Time:
translate 1000: 0.9 sec
regexp_replace: 1000: 0,25 sec

Round 2
Distance: 10,000 records

The increase in the number of records did not affect the results of the race too much, again both functions hit the finish line head to head, although this time  TRANSLATE was second faster.

Time:
translate10000: 1 sec
regexp_replace 10000: 2 sec

Round 3
Distance: 1,000,000 records

Increasing the distance to 1,000,000 did not reveal the final winner, again the results were very similar.

translate 100000: 19 sec
regexp_replace 100000: 23 sec

The race is tied.

Are both functions in this task equally fast?

I have decided to restart the race, this time changing the external conditions a bit, from sunny weather to rain ...

Race number 2
Round 1
Distance: 1000 records

TRANSLATE is the first to finish, slightly  ahead of REGEXP.

translate 100: 0.2 sec
regexp_replace 100; 4,7 sec

Round 2
Distance: 10,000 records

TRANSLATE definitely wins this round, REGEXP, out of breath, runs far behind his rival.

translate 10000: 1 sec
regexp_replace 10000: 45 sec

Round 3
Distance: 100,000 records

TRANSLATE does not give the rival a chance! By the time REGEXP hits the finish line, TRANSLATE has had a shower and a break with a colorful drink!

translate 100000: 19 sec
regexp_replace 100000: 454 sec

The definite winner of the rainy round: TRANSLATE!!!

Let's take another look at the results:





In Race 2, the REGEXP_REPLACE function time has twenty times!
Who was betting that?

But what has changed? Why did TRANSLATE and REGEXP_REPLACE go head-to-head in the first test, and why did TRANSLATE leave his rival far behind in the second test? What caused REGEXP_REPLECE to slow down so drastically?

Time for a solution: in Race 2, I only changed the NLS_LANGUAGE settings of the session.

The first test, where both functions had similar times, was performed with NLS_LANGUAGE = ENGLISH, and the second, where TRANSLATE definitely won, was performed with NLS_LANGUAGE = POLISH. As you can see, the performance of REGEXP depends very much on the language settings!

If you are using REGEXP in your system - pay attention to the NLS_LANGUAGE settings - perhaps changing these settings or switching from REGEXP to the standard TRANSLACE function s will save you precious minutes!

Source code:
Tests run on Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production.
 set serveroutpu on

DECLARE
    TYPE tab IS TABLE OF VARCHAR2(4000);
    t         tab;
    cnLoops  constant  NUMBER := 100;
    nLoops number := 0;   
    vLang varchar2(20);
    vTestLang varchar2(20); 
    vChar varchar2(4000);
    
    cursor cur is 
    SELECT dbms_random.string('x',2000)
    FROM dual
    CONNECT BY level <= nLoops;
    cnt number := 0;
    
    nTimeStart number := dbms_utility.get_time;  
    nTimeEnd number;  
   
BEGIN

for ncnt in 1..3 loop
  nLoops := case ncnt when  1 then   1000
                      when  2 then  10000 
                      when  3 then 100000  
            end ;         
  nTimeStart  := dbms_utility.get_time;  
  open cur; 
  fetch   cur bulk collect into t; 
  nTimeEnd  := dbms_utility.get_time;  
  
  DBMS_OUTPUT.PUT_LINE('Fetch '|| nLoops|| ': '|| trunc( (nTimeEnd - nTimeStart)/100,2) ); 
 
    for vTestLang in 1..2 loop
    
        if vTestLang = 2 then
            vLang := 'POLISH';
        else
            vLang := 'ENGLISH';
        end if;
       
        execute immediate  'ALTER SESSION SET NLS_LANGUAGE  = '||vLang;
            cnt := 0;
 
            nTimeStart  := dbms_utility.get_time;  
            FOR i IN 1..t.count LOOP
                vchar := translate(t(i),' 0987654321',' ');
                cnt := cnt + 1;
            END LOOP;
       
            nTimeEnd  := dbms_utility.get_time;  
            
            DBMS_OUTPUT.PUT_LINE('Translate '|| cnt ||';' || vLang || ';'|| nLoops|| ': '|| trunc( (nTimeEnd - nTimeStart)/100,2)) ; 

            cnt := 0;
           
            nTimeStart  := dbms_utility.get_time;  

            FOR i IN 1..t.count LOOP
                vchar := regexp_replace(t(i),'\d','');
                cnt := cnt + 1;
            END LOOP;
     
            nTimeEnd  := dbms_utility.get_time;  

           DBMS_OUTPUT.PUT_LINE('regexp '|| cnt ||';' || vLang || ';'|| nLoops|| ': '|| trunc((nTimeEnd - nTimeStart)/100,2));      
       
    end loop;
   close cur;
        
end loop;

exception when others then
    dbms_output.put_line(sqlerrm);
END;
/

Komentarze

  1. Languages and regular expressions definitely can create some nasty things. Some additional info here

    https://connor-mcdonald.com/2018/08/22/take-care-with-regular-expressions/

    OdpowiedzUsuń
    Odpowiedzi
    1. Wow, I have never thought to use regular expression in constraints :o Sounds weird to me :D But good to know that it should not be doing anyway :D

      Usuń

Prześlij komentarz