Loading data - SQL: is there any error? [GTT, tab]



Basic information SQL loading can be found in the article SQL - loading data. There is discussed the LOG ERRORS clause enabling logging of incorrect records to the log table without losing the valid records.

It is important to know if the process ran correctly or if any errors were logged.

The first, easiest way is to check the log table to see if there are records for our tag.

select  count(*)  from ERR$_myEmp 
where ora_err_tag$ = 'INS:20022020 194749';

  COUNT(*)
----------
         4

The solution is very simple but if the log table is large, searching through it may be inefficient. This will have a direct impact on the performance of the data loading process. In this situation, we should look for another solution.

First, let's try to write information about the error to the table. If we need information only at the moment - it can be a temporary table, the records will be deleted after the transaction is commited. If we need permanent information - we can create a regular table where we will store statistical information from data loading processes.

In the example I will present both methods at the same time, but of course it is enough to write one of them :)

I will use the standard HR schema.

First, I will create the tables:
myEmp 
test table, to which we will load data. I will change the length of last_name column to cause an error

myEmpCnt_gtt 
global temporary table for temporary storage of information about the number of records. The table will contain one cnt column - information about the number of records that should be loaded

myEmpCnt_tab 
standard table for storing statistical information for loading processes. The table will contain the                       TAG column - identification of the specific data loading process,
            cntAll - the number of all records that should be loaded,
            cntInserted - the number of records loaded
We will also put an index on the TAG column to efficiently retrieve information about loading.

create global temporary table myEmpCnt_gtt (cnt number) ON COMMIT DELETE ROWS;

create table myEmpCnt_tab (tag varchar2(100), cntAll number, cntInserted number);
create index  myEmpCnt_tab_idx on myEmpCnt_tab(tag);

create table myEmp as select * from employees where 1 =2 ;
alter table myEmp modify (last_name varchar2(9));

EXEC DBMS_STATS.gather_table_stats('HR', 'myEmpCnt_tab');
EXEC DBMS_STATS.gather_table_stats('HR', 'myEmp');

Create log table:

call dbms_errlog.create_error_log('myEmp');

I will use the INSERT ALL clause to insert information about the number of records to the myEmpCnt_tab and myEmpCnt_gtt tables. Thanks to this, I can insert not only the main table, but also additional tables, such as our statistical tables.

In order not to make additional inserts for all inserted records (performance!) - I will make additional inserts only for the first row (rownum = 1).

I will calculate the information on the number of records downloaded for loading in the select using the analytical function count (*) over.

I have 107 records in the source table, of which I expect 4 errors (too long value in the last_name column). Let's look at an example:

set serveroutput on;

declare 
 ncnt_ins number;
 nAll_gtt number;
 nAll_tab number;
 vTag varchar2(100);
begin


vTag := 'INS'||to_char(sysdate, 'ddmmyyyy hh24miss');

  insert all 
    when  rownum = 1 then into myEmpCnt_gtt (cnt) values (cnt) 
    when  rownum = 1 then into myEmpCnt_tab (tag, cntAll) values (vTag, cnt) 
    when  rownum > 0 then 
    into myEmp( employee_id,
        first_name,
        last_name,
        email,
        phone_number,
        hire_date,
        job_id,
        salary,
        commission_pct,
        manager_id,
        department_id) values ( employee_id,
        first_name,
        last_name,
        email,
        phone_number,
        hire_date,
        job_id,
        salary,
        commission_pct,
        manager_id,
        department_id)
        log errors (vTag)  reject limit unlimited

   select rownum ,  count(*) over () cnt ,  employee_id,
            first_name,
            last_name,
            email,
            phone_number,
            hire_date,
            job_id,
            salary,
            commission_pct,
            manager_id,
            department_id
    from employees;
    
    
    --- inserted rows
    ncnt_ins := sql%rowcount;
    
    
    -- select from gtt  how many rows should be inserted
    select cnt into nAll_gtt from myEmpCnt_gtt;
    
    dbms_output.put_line ('GTT all rows: '||nAll_gtt ||'  , Inserted: ' ||ncnt_ins||', Errors: '||to_char(nAll_gtt - ncnt_ins)); 
    
    -- update myEmpCnt_tab and return how many rows should be inserted
    update myEmpCnt_tab
    set cntInserted = ncnt_ins
    where tag = vTag
    returning cntAll into nAll_tab;
    
    dbms_output.put_line ('Tab all rows: '||nAll_tab ||'  , Inserted: ' ||ncnt_ins||', Errors: '||to_char(nAll_tab - ncnt_ins)); 
end;
/

So we have these values:
ncnt_ins - the number of actually inserted rows, i.e. SQL% ROWCOUNT 
nAll_gtt -  all rows number, selected from the myEmpCnt_gtt table
nAll_tab - all rows number, selected from the myEmpCnt_tab table

Based on this, we can calculate the number of errors:
for the myEmpCnt_gtt table : nAll_gtt - nct_ins 
for  myEmpCnt_tab: nAll_tab - nct_ins

Output from procedure:
GTT all rows: 107  , Inserted: 104, Errors: 3
Tab all rows: 107  , Inserted: 104, Errors: 3

In this solution, to retrieve information about rows number, we also need to select the table. However, both the GTT and the standard table, especially indexed, are much smaller and more efficient than the log table.

However, we must remember that if we insert data into more than one table - we will not be able to calculate the number of errors for each table separately. We will be able to calculate only the total number of inserted and incorrect records. However, if you do not need such details, the solution is simple and easy to apply.

Komentarze