👿 NULLs: the sneaky little devil. Do you know them WELL? 👿



Schemat: HR
Version: Oracle Database 23ai Free Release 23.0.0.0.0 - Develop, Learn, and Run for Free

Some time ago, I published a quiz on LinkedIn with the question:

👿 NULLs: the sneaky little devil. Do you know them WELL? 👿

You create a B-tree index on a nullable column. Will rows with NULL values be indexed?


Answers:

You create a B-tree index on a nullable column. Will rows with NULL values be indexed?
0%
83%
17%


And what is the correct answer?
Well — ALL the answers are (in some sense) correct. As one person commented under the quiz — IT DEPENDS 🙂

Let’s start with the most popular answer: NULL doesn’t count.
In an Oracle database, NULL is treated in a special way — more like “I don’t know” than “nothing”.
Because of this, rows where the column indexed with a B-tree index contains NULL will not appear in that index.
To find such rows, the database will have to perform a FULL TABLE SCAN, even if the column has high selectivity.

Let’s look at the example below:

create table emp as
select  * from employees;

select employee_id, manager_id
from emp where manager_id is null;

EMPLOYEE_ID MANAGER_ID
----------- ----------
        100                

The query returns only a single row.
So it seems to be a good candidate for creating a B-tree index.
create index i on emp(manager_id);

select employee_id, manager_id 
from emp where manager_id = 124;

--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     6 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     1 |     6 |     3   (0)| 00:00:01 |
-----------------------------

select employee_id, manager_id
from emp where manager_id is null;
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     2 |    12 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     2 |    12 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------
 

We created an index on the MANAGER_ID column.
A query for MANAGER_ID = 124 indeed uses the index without any problem.
However, a query for MANAGER_ID IS NULL has to perform a full table scan.
The row with a NULL value didn’t make it into the index.

But can NULLs somehow sneak into a B-tree index after all?
Of course — these are clever little devils, and there are a few ways to make it happen.
If we have a situation where the column is selective and we want to be able to search for NULL rows using an index, there are a few approaches:

Composite index

We can use a composite index.
As the second column, we choose a NOT NULL column or — PIEROGI (or any other literal).

drop index i;
create index i on emp(manager_id, 'Pierogi');

select employee_id, manager_id 
from emp where manager_id is null;

--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |     2 |    12 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| EMP  |     2 |    12 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | I    |     2 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------
 


If the set of columns in a composite index is not entirely NULL — meaning if at least one column has a value — that row will appear in our index.
But what’s with the pierogi? Here, as the second column, we indexed a literal, something that always has a value. This forces Oracle to include that row in the index.

 
select 'Pierogi', manager_id
from emp;

PIEROGI MANAGER_ID
------- ----------
Pierogi           
Pierogi        100
Pierogi        100
Pierogi        102
Pierogi        103
Pierogi        103


FBI

Function Based Index

Another solution is to use a function-based B-tree index, that is, an index created on a function.
We can use any function that handles NULL — for example, NVL.

 
drop index i;

select nvl(manager_id, 0) , manager_id 
from emp 
where manager_id is null;

NVL(MANAGER_ID,0) MANAGER_ID
----------------- ----------
                0           

create index i on emp(nvl(manager_id, 0));

select employee_id, manager_id
from emp 
where manager_id is null;
--------------------------------------------------------------------------
| Id  | Operation         | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------
|   0 | SELECT STATEMENT  |      |     1 |     8 |     3   (0)| 00:00:01 |
|*  1 |  TABLE ACCESS FULL| EMP  |     1 |     8 |     3   (0)| 00:00:01 |
--------------------------------------------------------------------------
 
Predicate Information (identified by operation id):
---------------------------------------------------

select employee_id, manager_id
from emp 
where nvl(manager_id, 0) = 0 ;
--------------------------------------------------------------------------------------------
| Id  | Operation                           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                    |      |     1 |    21 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID BATCHED| EMP  |     1 |    21 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN                  | I    |     1 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

However, it’s important to note that in the WHERE clause we must use the exact expression used in the index.
If we search with WHERE MANAGER_ID IS NULL, Oracle will still perform a full table scan. It’s good to keep this in mind.

MSSQL

Observant readers probably noticed that I didn’t specify which database the question referred to. Most assumed (correctly) that it was about Oracle.
But what if a similar question came up in the context of MSSQL Server?
You might say “a database is a database,” but the details do differ.
In MSSQL, B-tree indexes do include rows with NULL values!

https://sqlfiddle.com/sql-server/online-compiler?id=0ff73921-82e1-42c1-89b3-3ab4eb53effc
Create table emp 
(emp_id numeric,
 last_name varchar(100),
 manager_id numeric);
 
GO
INSERT INTO emp (emp_id, last_name, manager_id)
SELECT TOP (100000)
       ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) + 5,
       'Test',
       2
FROM sys.objects a
CROSS JOIN sys.objects b;
GO

-- create index
CREATE INDEX IX_emp_manager_id ON emp(manager_id);
GO

UPDATE STATISTICS emp;
GO

SET SHOWPLAN_ALL ON; 
GO

-- query null
SELECT * FROM emp   WHERE manager_id IS NULL ; 
GO

SET SHOWPLAN_ALL OFF;
GO

Execution plan:
	--Nes	ed Loops(Inner Join, OUTER REFERENCES:([Bmk1000])) 1 2 1 Nested Loops Inner Join OUTER REFERENCES:([Bmk1000]) NULL 1.0 0.0 4.1799999E-6 79 6.5703802E-3 [db_44b4usptx_44bbwzjnr].[dbo].[emp].[emp_id], [db_44b4usptx_44bbwzjnr].[dbo].[emp].[last_name], [db_44b4usptx_44bbwzjnr].[dbo].[emp].[manager_id] NULL PLAN_ROW 0 1.0
|	-Index Seek(OBJECT:([db_44b4usptx_44bbwzjnr].[dbo].[emp].[IX_emp_manager_id]), SEEK:([db_44b4usptx_44bbwzjnr].[dbo].[emp].[manager_id]=NULL) ORDERED FORWARD) 1 3 2 Index Seek Index Seek OBJECT:([db_44b4usptx_44bbwzjnr].[dbo].[emp].[IX_emp_manager_id]), SEEK:([db_44b4usptx_44bbwzjnr].[dbo].[emp].[manager_id]=NULL) ORDERED FORWARD [Bmk1000], [db_44b4usptx_44bbwzjnr].[dbo].[emp].[manager_id] 1.0 0.003125 0.0001581 24 0.0032831 [Bmk1000], [db_44b4usptx_44bbwzjnr].[dbo].[emp].[manager_id] NULL PLAN_ROW 0 1.0
|	-RID Lookup(OBJECT:([db_44b4usptx_44bbwzjnr].[dbo].[emp]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD) 1 5 2 RID Lookup RID Lookup OBJECT:([db_44b4usptx_44bbwzjnr].[dbo].[emp]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD [db_44b4usptx_44bbwzjnr].[dbo].[emp].[emp_id], [db_44b4usptx_44bbwzjnr].[dbo].[emp].[last_name] 1.0 0.003125 0.0001581 70 0.0032831 [db_44b4usptx_44bbwzjnr].[dbo].[emp].[emp_id], [db_44b4usptx_44bbwzjnr].[dbo].[emp].[last_name] NULL PLAN_ROW 0 1.0

As you can see — in MSSQL, you don’t have to do anything to make a row with a NULL value get indexed. It just works here :)

Komentarze