Loading data - SQL.


Loading large volume of data into a database is one of the challenges for Oracle developers. Loading processes in PL / SQL are easy to manage and analyze any errors. However - even well-written ones are not as fast as pure DML.

DML is actually fast. However, it also has its issues. If we load thousands of data and one record will cause an error, by default everything will be rolled back, even the correct records. What's worse - we will get information about the error that terminated our process, but we will not know which record caused this error. And with data volumes reaching hundreds of thousands or even millions of records - finding the right one is a miracle.

Not to mention the performance - maybe we waited a few hours for the data to load. If an error has occurred - we must wait another hours for rollback!

It is often recommended to load data using the PL/SQL function, where we can manage loading records as we want.

But let's look at an example.

I use the standard HR scheme

For the purposes of the test, I create an empty myEmp table as a copy of the Employees table.

Next, I will reduce the last_name column length to 9 characters (there are 4 records with last_name> 9 in the employees table) to force an error when loading data.
-- create test empty table
create table myEmp as select * from employees where 1 = 2;
 
-- change column length to induce error (max lenght of the last_name column is 11)
alter table myEmp modify (last_name varchar2(9));

We have 107 records in the source table (in my example), of which we expect 4 errors.

select case when length(last_name ) <= 9 then ' ok rows' else ' errors' end typ,  count(*) 
from employees 
group by case when length(last_name ) <= 9 then ' ok rows' else ' errors' end;

TYP        COUNT(*)
-------- ----------
 errors           4
 ok rows        103


Let's load the data.

-- load employee data   - we expect some errors
insert into myEmp  select * from employees;
Error starting at line : 11 in command -
insert into myEmp  select * from employees
Error report -
ORA-12899: wartość zbyt duża dla kolumny "HR"."MYEMP"."LAST_NAME" (obecna: 10, maksymalna: 9)
select * from myEmp;

no rows selected

Loading failed. Not a single record has been inserter into the myEmp table.
We have of course error information:
ORA-12899: value too large for column "HR"."MYEMP"."LAST_NAME" (current: 10, max: 9)
but we don't know which record raised this error! Searching for such is, especially in a large table, like looking for a needle in a haystack.

In that case, are we really doomed to slow PL / SQL?

Fortunately not!

There is a mechanism for DML since version 10: LOG ERROR.

Thanks to this functionality, not only will we get full information about an error, but the remaining records will be inserted!

Let's look at the syntax
LOG ERRORS [INTO [schema.]table] [('simple_expression')] [REJECT LIMIT integer|UNLIMITED]
TABLE 
optional, log table name. If we do not specify this parameter - errors will be inserted into the table on which the DML was executed with the prefix ERR $ _

SIMPLE_EXPRESSION
a tag that can be used to mark a particular DML process, such as the date of loading. Remember that this is the type varchar2 (2000).

REJECT LIMIT
maximum number of logged errors before the process is terminated. UNLIMITED - no limit.

To be able to use LOG ERRORS we need to create a table to which we will write errors. You can create a table using below procedure

-- create table for errors
-- syntax
DBMS_ERRLOG.CREATE_ERROR_LOG (
   dml_table_name            IN VARCHAR2,          --orginal table name
   err_log_table_name        IN VARCHAR2 := NULL, -- err_log table name, default ERR$_<orginal_table_name>
   err_log_table_owner       IN VARCHAR2 := NULL, -- err_log table owner, default orginal table schema owner
   err_log_table_space       IN VARCHAR2 := NULL, -- err_log tablespace, default orginal table tablespace 
   skip_unsupported          IN BOOLEAN := FALSE);  -- TRUE column not supported will be skipped, FALSE - unsupported column type will cause the procedure to terminate (LONG, CLOB, BLOB, BFILE, ADT)
   
/

Most often we simply call the procedure with the table name parameter to which we create the log.

-- Create the error logging table.
BEGIN
  DBMS_ERRLOG.create_error_log (dml_table_name => 'myEmp');
END;
/

Our log table created in this way will be named ERR$_myEmp. Let's load data into the myEmp table using LOG ERRORS.

insert into myEmp 
select * from employees
LOG ERRORS ('INS:' || to_char(sysdate,'ddmmyyyy hh24miss')) REJECT LIMIT UNLIMITED;
103 rows inserted.

We have 103 records!
But what about four errors? Have been logged into the ERR$_myEmp error table. We have information about the tag, error and the entire wrong record.
  To sum up: 
If we need to load and process data into the system - the best way to do that will be loading functions written in PL/SQL using BULK COLLECT. However, if we load data and do not want to convert (maybe a little) or process (at this point) it is worth considering the DML with LOG ERRORS. It will save us a lot of time and we will not lose detailed information about any errors.

Komentarze