One time, I was analyzing a system that was having performance problems. I have noticed that the data is being processed row by row. My first suggestion was to use BULK COLLECT.
To my surprise, the developer said it wouldn't be faster because the system already uses BULK COLLECT. He surprised me with this. The code looked like classic row-by-row processing. I decided to check it again. The programmer was right! The BULK COLLECT method was used to retrieve the data.
So why did the code look like row-by-row processing?
So why was the system slow despite the best optimization techniques?
Code analysis quickly answered these questions. Well, queries using the BULK COLLET method took one row at a time. In a loop. Row by row.
Does this use of BULK COLLECT make any sense?
How do I convert code that processes records row by row into a process that uses a bulk method?
Let's check.
I conducted the tests on the basis of Oracle Database 18c Express Edition Release 18.0.0.0.0 - Production.
For the tests, I used a predefined HR scheme.
I created a test table EMP_BULK_TEST based on HR.EMPLOYEES table.
create table emp_bulk_test as with data as (select level lvl from dual connect by
level <= 5000
) select rownum new_employee_id , count(*) over () cnt, employee_id,lvl, first_name, last_name, email, phone_number, hire_date, job_id, salary, commission_pct, manager_id, department_id from employees, data ;
alter table emp_bulk_test add (new_salary number);
alter table emp_bulk_test add constraint emp_bulk_pk primary key(new_employee_id);
create index emp_bulk_idx on emp_bulk_test(department_id);
-- update to clear data before test update emp_bulk_test set new_salary = null; commit;
There are over half a million records in the EMP_BULK_TEST table.
Test description:
Employees of departments located in the USA (LOCATIONS.COUNTRY_ID = 'US') will receive a salary (SALARY) increase by the commission percentage (COMMISSION_PCT).
In the first test, classic row-by-row processing, in the first step, we will loop through the departments to retrieve those located in the US.
In the next step, we will get the id (EMPLOYEE_ID) of employees of particular departments.
In the third step, we will get the employee's details. Based on these data, we will calculate the new salary value.
In the fourth step, we update EMP_BULK_TEST with a new salary value.
Test 1:
set serveroutput on declare nSalary emp_bulk_test.salary%type; nNewSalary emp_bulk_test.new_salary%type; ncommission_pct emp_bulk_test.commission_pct%type; nTimeStart number := dbms_utility.get_time; nTimeEnd number; nCnt number :=0 ; begin nTimeStart := dbms_utility.get_time;
for dept in
(select department_id from departments d, locations loc where d.location_id = loc.location_id and country_id = 'US') loop
for z in
(select new_employee_id from emp_bulk_test where department_id = dept.department_id) loop select salary, commission_pct into nSalary, ncommission_pct from emp_bulk_test where new_employee_id = z.new_employee_id; nNewSalary := nSalary + nvl(ncommission_pct, 0.01) * nSalary;
update emp_bulk_test set New_Salary = nNewSalary where new_employee_id = z.new_employee_id;
commit; nCnt := nCnt +1;
end loop; end loop
; nTimeEnd := dbms_utility.get_time; DBMS_OUTPUT.PUT_LINE('Row by row: '||nCnt ||' ' || trunc( (nTimeEnd - nTimeStart)/100,2) ); end;
Row by row: 340000 :55,42
In test two we will use BULK COLLECT without changing anything else in the code.
Test 2.
set serveroutput on declare nEmp_id number; type tabnum is table of number index by pls_integer; tSalary tabnum; tcomm tabnum; nNewSalary number; ---------------------------------------- nTimeStart number := dbms_utility.get_time; nTimeEnd number; nCnt number := 0; begin nTimeStart := dbms_utility.get_time; for dept in (select department_id from departments d, locations loc where d.location_id = loc.location_id and country_id = 'US') loop for z in (select new_employee_id from emp_bulk_test where department_id = dept.department_id) loop select salary, commission_pct bulk collect into tSalary, tComm from emp_bulk_test where new_employee_id = z.new_employee_id; nNewSalary := tSalary(1) + nvl(tcomm(1), 0.01) * tSalary(1); update emp_bulk_test set New_Salary = nNewSalary where new_employee_id = z.new_employee_id; commit; nCnt := nCnt + 1; end loop; END LOOP; nTimeEnd := dbms_utility.get_time; DBMS_OUTPUT.PUT_LINE('Bulk 1:'|| nCnt || ' ' || trunc( (nTimeEnd - nTimeStart)/100,2) ); end;
Bulk 1: 340000 56,06
Time is similar to row by row processing. So it won't be faster? Can we do something?
You may have noticed that the queries in the procedure are quite easy to combine. Remember that in this simple example, the ability to combine queries is immediately obvious. However, in a real system, where the queries are more complex and the code is divided into functions and procedures - you will have to look closely to see this possibility.
In the third test, we will combine all queries into one.
Test 3:
set serveroutput on declare nSalary emp_bulk_test.salary%type; nNewSalary emp_bulk_test.new_salary%type; ncommission_pct emp_bulk_test.commission_pct%type; nTimeStart number := dbms_utility.get_time; nTimeEnd number; nCnt number := 0; begin nTimeStart := dbms_utility.get_time; for z in (
select new_employee_id , salary, commission_pct from emp_bulk_test where department_id in (select department_id from departments d, locations loc where d.location_id = loc.location_id and country_id = 'US')
) loop nNewSalary := z.Salary + nvl(z.commission_pct, 0.01) * z.Salary; update emp_bulk_test set New_Salary = nNewSalary where new_employee_id = z.new_employee_id; commit; nCnt := nCnt +1; end loop; nTimeEnd := dbms_utility.get_time; DBMS_OUTPUT.PUT_LINE('Row by row 2: '||nCnt || ' - ' || trunc( (nTimeEnd - nTimeStart)/100,2) ); end;
Row by row 2: 340000 - 35,57
Almost twice as fast! There is something to fight for.
Finally, let's rewrite the procedure for bulk processing - BULK COLLECT + FORALL. Will we speed up a few more seconds?
Test 4:
set serveroutpu on
declare
cursor cur is
select new_employee_id, salary, commission_pct from emp_bulk_test
where department_id in ( select department_id from departments d, locations loc where d.location_id = loc.location_id and country_id = 'US');
type cur_tab is table of cur%rowtype;
cur_t cur_tab;
type tabnum is table of number index by pls_integer;
tNewSalary tabnum;
nLimit integer := 1000;
----------------------------------------
nTimeStart number := dbms_utility.get_time;
nTimeEnd number;
nCnt number := 0;
begin
nTimeStart := dbms_utility.get_time;
open cur;
loop
fetch cur bulk collect into cur_t
limit nlimit;
exit when cur_t.count = 0;
FOR i IN 1..cur_t.count LOOP
tNewSalary(i) := cur_t(i).Salary + nvl(cur_t(i).commission_pct, 0.01) * cur_t(i).Salary;
END LOOP;
forall i in 1..cur_t.count
update emp_bulk_test set New_Salary = tNewSalary(i) where new_employee_id = cur_t(i).new_employee_id;
commit;
nCnt := nCnt + cur_t.count;
END LOOP;
nTimeEnd := dbms_utility.get_time;
DBMS_OUTPUT.PUT_LINE('Bulk 2: '|| ncnt ||' - ' || trunc( (nTimeEnd - nTimeStart)/100,2) );
end;
Bulk 2: 340000 - 2,91
Not only a few seconds but twenty times faster than with classic row-by-row processing!
Summary of processing times:
--------------------------------------------
T1: row by row: 55,42 sec
T2: Bulk + row by row: 56,06 sec
T3: row by row 2: 35,57 sec
T4: Bulk: 2,91 sec
Well, the results leave no room for doubt. Merely using the BULK COLLECT without a bulk-ready process is of no benefit. However, the correct use of BULK COLLECT together with FORALL can bring significant performance benefits.
Nice comparison.
OdpowiedzUsuńTest 1 and Test 2 are practically identical, since the BULK COLLECT select in Test 2 is just "bulk" collecting a single row (primary key lookup) so there isn't any real "bulk" gain in Test 2 versus Test 1.
Both Test 2 and Test 1 benefit from the implicit bulk collect that a cursor FOR loop does anyway. The biggest issues are doing select within a cursor loop within a cursor loop and then doing single row UPDATEs.
Test 3 improves a lot by avoiding the nested loops/selects and do joins instead, but it still executes single row UPDATEs, which is still expensive.
Test 4 really shows that is is *combining* BULK COLLECT and FORALL that really is powerful. That is good.
But still it is only if you have situations where you really *need* to do some PL/SQL calculations. I know this is just a simple test case, but if a real case only does something like this, then it can still be better to use plain SQL:
update emp_bulk_test
set New_Salary = Salary + nvl(commission_pct, 0.01) * Salary
where department_id in ( select department_id from departments d, locations loc where d.location_id = loc.location_id and country_id = 'US');
I totally agree - simple update is always (almost) better. But in complex system (backoffice) there are a lot of calculation or other things. What I see all the time is SQL and another SQL and another SQL - and all SQLs are identical or very similar, retriving one or two records. What I wanted do emphisize here is that usage of bulk collect is not enough- if bulk takes only one row (like in anegdote, sadly, true story) - is not faster a bit. You have to use bulk (or other technic) CORRECTLY to get an advantages.
OdpowiedzUsuń