Consecutive and Overlapping date ranges


On my blog, I have already dedicated two posts to the topic of verifying DATES fields for continuity: How to check if a record is a "continuation" of the previous one? Part 1 and How to check if a record is a continuation? Part 2.

Continuation of a row is not the only problem we encounter in the real world. Just as often, we ask the question: do the date ranges in the rows overlap? Today, I will try to answer this question using a query and the MATCH_RECOGNIZE syntax.

Examples run on the HR schema in Oracle Database 21c Express Edition Release 21.0.0.0.0 - Production Schema HR.

Let's create  DATES_TEST table
CREATE TABLE dates_test (
  id         NUMBER,
  start_date DATE,
  end_date   DATE,
  text varchar2(100)
);

Test case 1 -   consecutive rows
We will want to find rows where the end date of the current row is one day less than the start date of the next row. Or, looking at it the other way: the start date of the current row is one day greater than the end date of the previous row. Let's prepare the test data: we will add 3 rows with consecutive date ranges and, for verification, 2 rows with different date ranges.

begin
    delete from dates_test;
     -- consecutive rows
    INSERT INTO dates_test VALUES (1, TO_DATE('20120101','YYYYMMDD'), TO_DATE('20120105','YYYYMMDD'), 'consecutive');
    INSERT INTO dates_test VALUES (2, TO_DATE('20120106','YYYYMMDD'), TO_DATE('20120108','YYYYMMDD'), 'consecutive');
    INSERT INTO dates_test VALUES (3, TO_DATE('20120109','YYYYMMDD'), TO_DATE('20120110','YYYYMMDD'), 'consecutive');
    
    -- normal rows
    INSERT INTO dates_test VALUES (90, TO_DATE('20120401','YYYYMMDD'), TO_DATE('20120404','YYYYMMDD'), 'normal');
    INSERT INTO dates_test VALUES (91, TO_DATE('20120406','YYYYMMDD'), TO_DATE('20120407','YYYYMMDD'), 'normal');

COMMIT;
end;
/

Let's look at the data:
   ID START_DATE END_DATE   TEXT           
----- ---------- ---------- ---------------
    1 01-01-2012 05-01-2012 consecutive    
    2 06-01-2012 08-01-2012 consecutive    
    3 09-01-2012 10-01-2012 consecutive    
   90 01-04-2012 04-04-2012 normal         
   91 06-04-2012 07-04-2012 normal            

We can see that the start date of the rows is 1 day greater than the end date of the previous row.

How can we find such rows in the table and check if the date ranges of the rows overlap? In a previous post, I showed a method using MATCH_RECOGNIZE and the ALWAYS TRUE variable. This time, we will try a different method – in the pattern definition, I will compare not only the start date to the end date of the previous row as before but also the end date to the start date of the NEXT row. In this version, we do not need to use the ALWAYS TRUE variable.

The most important thing here is to sort the records correctly – I sort by the start date and then by the end date. In the pattern definition, we specify the verification as described:

  • end date of the current row + 1 = start date of the next row
  • start date of the current row – 1 = end date of the previous row

NOTE! For clarity, queries here do not take into account end dates with null values, which are often found in date ranges. If you have such a case, simply add the NVL function to the END_DATE column and set a distant date, e.g., 12.12.2999.


SELECT  * 
  FROM dates_test   
   MATCH_RECOGNIZE (  
      ORDER BY start_date , end_Date nulls last
      MEASURES  
        final first(id) as fid,
        first(start_date ) as sd,
        final last(end_date ) as ed,
        max(end_date) as med,
        next(start_Date) nsd,
        MATCH_NUMBER() AS mno,  
        CLASSIFIER() AS cls  
  ALL ROWS PER MATCH   with unmatched rows
  PATTERN (consecutive+)    
  DEFINE  consecutive as  consecutive.end_date   + 1 = next(consecutive.start_date)
                      or  consecutive.start_date - 1 = prev(consecutive.end_date))
order by start_date, end_date;

Thera are the results 😊

START_DATE END_DATE          FID SD         ED         MED        NSD               MNO CLS                ID TEXT           
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- --------------- ----- ---------------
01-01-2012 05-01-2012          1 01-01-2012 10-01-2012 05-01-2012 06-01-2012          1 CONSECUTIVE         1 consecutive    
06-01-2012 08-01-2012          1 01-01-2012 10-01-2012 08-01-2012 09-01-2012          1 CONSECUTIVE         2 consecutive    
09-01-2012 10-01-2012          1 01-01-2012 10-01-2012 10-01-2012 01-04-2012          1 CONSECUTIVE         3 consecutive    
01-04-2012 04-04-2012                                                                                      90 normal         
06-04-2012 07-04-2012                                                                                      91 normal         

We see that rows with consecutive date ranges have been classified under one pattern (column MNO = 1 for all records).

The two control rows with IDs 90 and 91 were not classified under the pattern.

We can also merge consecutive records into a single row. 

SELECT  * 
  FROM dates_test   
   MATCH_RECOGNIZE (  
      ORDER BY start_date , end_Date nulls last
      MEASURES  
        final first(id) as fid,
        first(start_date ) as sd,
        final last(end_date ) as ed,
        max(end_date) as med,
        next(start_Date) nsd,
        MATCH_NUMBER() AS mno,  
        CLASSIFIER() AS cls  
  ONE ROW PER MATCH
  PATTERN (consecutive+)    
  DEFINE  consecutive as  consecutive.end_date   + 1 = next(consecutive.start_date)
                      or  consecutive.start_date - 1 = prev(consecutive.end_date))
;


Results:
       FID SD         ED         MED        NSD               MNO CLS            
---------- ---------- ---------- ---------- ---------- ---------- ---------------
         1 01-01-2012 10-01-2012 10-01-2012 01-04-2012          1 CONSECUTIVE    

The start date of our set of test rows is 01.01.2012 and the end date is 10.01.2012. Everything matches!

The first case is resolved. 😊 

Test case 2 – overlapping date ranges

The second case is equally interesting. We will want to search the table and find rows where the date ranges overlap. For consecutive rows, the end date of the current row must be one day less than the start date of the next row. In this case, we need to find rows that meet the following condition:

  • The start date of the current row must be less than or equal to the end date of the next row.

Again, sorting is crucial here. We sort the rows by the start date first and then by the end date.

Let's load the appropriate data where the date ranges overlap. Additionally, we will insert two rows with different date ranges, just to be sure.


begin
    delete from dates_test;
   -- overlapping rows
    INSERT INTO dates_test VALUES (4, TO_DATE('20120201','YYYYMMDD'), TO_DATE('20120207','YYYYMMDD'), 'overlapping');
    INSERT INTO dates_test VALUES (5, TO_DATE('20120206','YYYYMMDD'), TO_DATE('20120209','YYYYMMDD'), 'overlapping');
    INSERT INTO dates_test VALUES (6, TO_DATE('20120208','YYYYMMDD'), TO_DATE('20120210','YYYYMMDD'), 'overlapping');
    
    -- normal rows
    INSERT INTO dates_test VALUES (90, TO_DATE('20120401','YYYYMMDD'), TO_DATE('20120404','YYYYMMDD'), 'normal');
    INSERT INTO dates_test VALUES (91, TO_DATE('20120406','YYYYMMDD'), TO_DATE('20120407','YYYYMMDD'), 'normal');

COMMIT;
end;
/

Data:

   ID START_DATE END_DATE   TEXT           
----- ---------- ---------- ---------------
    4 01-02-2012 07-02-2012 overlapping    
    5 06-02-2012 09-02-2012 overlapping    
    6 08-02-2012 10-02-2012 overlapping    
   90 01-04-2012 04-04-2012 normal         
   91 06-04-2012 07-04-2012 normal        

For row with ID 4, we have overlapping date ranges: the end date of the row with ID = 4 is 07.02.2012, while the start date of the next row, row with ID = 5, is earlier than the end date of the previous row. The date ranges overlap.

Let's try to find rows with overlapping date ranges using a query.


 SELECT  * 
  FROM dates_test   
   MATCH_RECOGNIZE (  
      ORDER BY start_date , end_Date nulls last
      MEASURES  
        final first(id) as fid,
        first(start_date ) as sd,
        final last(end_date ) as ed,
        max(end_date) as med,
        next(start_Date) nsd,
        MATCH_NUMBER() AS mno,  
        CLASSIFIER() AS cls  
  ALL ROWS PER MATCH   with unmatched rows
  PATTERN (str overlap+)    
   DEFINE  overlap as  overlap.start_date  <= prev(overlap.end_date) )
;

In the pattern definition, we declared a condition that we are looking for rows where the start date of the current row is less than or equal to the end date of the previous row. In the pattern, we use the ALWAYS TRUE variable (str) – a "virtual" reference row for the first row in the set.

Let's check the results

START_DATE END_DATE          FID SD         ED         MED        NSD               MNO CLS                ID TEXT           
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- --------------- ----- ---------------
01-02-2012 07-02-2012          4 01-02-2012 10-02-2012 07-02-2012 06-02-2012          1 STR                 4 overlapping    
06-02-2012 09-02-2012          4 01-02-2012 10-02-2012 09-02-2012 08-02-2012          1 OVERLAP             5 overlapping    
08-02-2012 10-02-2012          4 01-02-2012 10-02-2012 10-02-2012 01-04-2012          1 OVERLAP             6 overlapping    
01-04-2012 04-04-2012                                                                                      90 normal         
06-04-2012 07-04-2012                                                                                      91 normal         

Looks good. The start date (SD) and end date (ED) for the entire pattern are the same. All rows have been classified under one pattern MNO = 1 for rows with IDs 4, 5, 6.

Let's collapse this into one row.

SELECT  * 
  FROM dates_test   
   MATCH_RECOGNIZE (  
      ORDER BY start_date , end_Date nulls last
      MEASURES  
        final first(id) as fid,
        first(start_date ) as sd,
        final last(end_date ) as ed,
        max(end_date) as med,
        next(start_Date) nsd,
        MATCH_NUMBER() AS mno,  
        CLASSIFIER() AS cls  
  one row per match
  PATTERN (str overlap+ )    
   DEFINE  overlap as   overlap.start_date  <= prev(overlap.end_date)
                  )
;

Let's see if it works.
      FID SD         ED         MED        NSD               MNO CLS            
---------- ---------- ---------- ---------- ---------- ---------- ---------------
         4 01-02-2012 10-02-2012 10-02-2012 01-04-2012          1 OVERLAP        

The start date (SD) is 01.02.2012 and the end date (ED) is 10.02.2012. This is exactly what we expected.

Test Case 3 – consecutive and overlapping date ranges 

What if we combine the test data from case 1 and case 2, and we want to find rows that have consecutive date ranges as well as rows with overlapping date ranges? Will our queries still work correctly?

Let's prepare the test data:

  • 3 rows with consecutive date ranges
  • 3 rows with overlapping date ranges
  • 2 rows with different date ranges for comparison

begin
   delete from dates_test;
     -- consecutive rows
    INSERT INTO dates_test VALUES (1, TO_DATE('20120101','YYYYMMDD'), TO_DATE('20120105','YYYYMMDD'), 'consecutive');
    INSERT INTO dates_test VALUES (2, TO_DATE('20120106','YYYYMMDD'), TO_DATE('20120108','YYYYMMDD'), 'consecutive');
    INSERT INTO dates_test VALUES (3, TO_DATE('20120109','YYYYMMDD'), TO_DATE('20120110','YYYYMMDD'), 'consecutive');
    
    -- overlapping rows
    INSERT INTO dates_test VALUES (4, TO_DATE('20120201','YYYYMMDD'), TO_DATE('20120207','YYYYMMDD'), 'overlapping');
    INSERT INTO dates_test VALUES (5, TO_DATE('20120206','YYYYMMDD'), TO_DATE('20120209','YYYYMMDD'), 'overlapping');
    INSERT INTO dates_test VALUES (6, TO_DATE('20120208','YYYYMMDD'), TO_DATE('20120210','YYYYMMDD'), 'overlapping');

    -- normal rows
    INSERT INTO dates_test VALUES (90, TO_DATE('20120401','YYYYMMDD'), TO_DATE('20120404','YYYYMMDD'), 'normal');
    INSERT INTO dates_test VALUES (91, TO_DATE('20120406','YYYYMMDD'), TO_DATE('20120407','YYYYMMDD'), 'normal');


    INSERT INTO dates_test VALUES (7, TO_DATE('20120501','YYYYMMDD'), TO_DATE('20120505','YYYYMMDD'), 'consecutive');
    INSERT INTO dates_test VALUES (8, TO_DATE('20120506','YYYYMMDD'), TO_DATE('20120508','YYYYMMDD'), 'consecutive');
    INSERT INTO dates_test VALUES (9, TO_DATE('20120509','YYYYMMDD'), TO_DATE('20120510','YYYYMMDD'), 'consecutive');
    
    -- overlapping rows
    INSERT INTO dates_test VALUES (10, TO_DATE('20120301','YYYYMMDD'), TO_DATE('20120307','YYYYMMDD'), 'overlapping');
    INSERT INTO dates_test VALUES (11, TO_DATE('20120306','YYYYMMDD'), TO_DATE('20120309','YYYYMMDD'), 'overlapping');
    INSERT INTO dates_test VALUES (12, TO_DATE('20120308','YYYYMMDD'), TO_DATE('20120310','YYYYMMDD'), 'overlapping');
COMMIT;
end;
/

We will combine the queries from tests 1 and 2.
SELECT  * 
  FROM dates_test   
   MATCH_RECOGNIZE (  
      ORDER BY start_date , end_Date nulls last
      MEASURES  
        final first(id) as fid,
        first(start_date ) as sd,
        final last(end_date ) as ed,
        max(end_date) as med,
        next(start_Date) nsd,
        MATCH_NUMBER() AS mno,  
        CLASSIFIER() AS cls  
  ALL ROWS PER MATCH   with unmatched rows
  PATTERN ((str overlap+ )| consecutive+ )    
  DEFINE    
    consecutive as  consecutive.end_date   + 1 = next(consecutive.start_date)
                or  consecutive.start_date - 1 = prev(consecutive.end_date),
    overlap as   overlap.start_date  <= prev(overlap.end_date)
            )    
order by mno, id;

Now we have defined two patterns: CONSECUTIVE and OVERLAP.

In the PATTERN clause, we start by checking the OVERLAP pattern along with the ALWAYS TRUE variable. Then, we combine this pattern using the OR operator | with the CONSECUTIVE pattern.

Let's check the results.

START_DATE END_DATE          FID SD         ED         MED        NSD               MNO CLS                ID TEXT           
---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- --------------- ----- ---------------
01-01-2012 05-01-2012          1 01-01-2012 10-01-2012 05-01-2012 06-01-2012          1 CONSECUTIVE         1 consecutive    
06-01-2012 08-01-2012          1 01-01-2012 10-01-2012 08-01-2012 09-01-2012          1 CONSECUTIVE         2 consecutive    
09-01-2012 10-01-2012          1 01-01-2012 10-01-2012 10-01-2012 01-02-2012          1 CONSECUTIVE         3 consecutive    
01-02-2012 07-02-2012          4 01-02-2012 10-02-2012 07-02-2012 06-02-2012          2 STR                 4 overlapping    
06-02-2012 09-02-2012          4 01-02-2012 10-02-2012 09-02-2012 08-02-2012          2 OVERLAP             5 overlapping    
08-02-2012 10-02-2012          4 01-02-2012 10-02-2012 10-02-2012 01-03-2012          2 OVERLAP             6 overlapping    
01-03-2012 07-03-2012         10 01-03-2012 10-03-2012 07-03-2012 06-03-2012          3 STR                10 overlapping    
06-03-2012 09-03-2012         10 01-03-2012 10-03-2012 09-03-2012 08-03-2012          3 OVERLAP            11 overlapping    
08-03-2012 10-03-2012         10 01-03-2012 10-03-2012 10-03-2012 01-04-2012          3 OVERLAP            12 overlapping    
01-05-2012 05-05-2012          7 01-05-2012 10-05-2012 05-05-2012 06-05-2012          4 CONSECUTIVE         7 consecutive    
06-05-2012 08-05-2012          7 01-05-2012 10-05-2012 08-05-2012 09-05-2012          4 CONSECUTIVE         8 consecutive    
09-05-2012 10-05-2012          7 01-05-2012 10-05-2012 10-05-2012                     4 CONSECUTIVE         9 consecutive    
01-04-2012 04-04-2012                                                                                      90 normal         
06-04-2012 07-04-2012                                                                                      91 normal         

Rows with IDs 1-3 and 7-8 have been correctly classified under the CONSECUTIVE pattern, rows 4-6 and 10-12 have been correctly classified under the OVERLAP pattern, while the comparison rows, 90 and 91, have not been classified under any pattern.

Let's condense the results to one row per pattern.

SELECT  * 
  FROM dates_test   
   MATCH_RECOGNIZE (  
      ORDER BY start_date , end_Date nulls last
      MEASURES  
        final first(id) as fid,
        first(start_date ) as sd,
        final last(end_date ) as ed,
        max(end_date) as med,
        next(start_Date) nsd,
        MATCH_NUMBER() AS mno,  
        CLASSIFIER() AS cls  
  one row per match
  PATTERN ((str overlap+ )|consecutive+)    
  DEFINE    
    consecutive as  consecutive.end_date   + 1 = next(consecutive.start_date)
                or  consecutive.start_date - 1 = prev(consecutive.end_date),
    overlap as  overlap.start_date  <= prev(overlap.end_date)
            ) 
order by sd;

Let's check the results.
       FID SD         ED         MED        NSD               MNO CLS            
---------- ---------- ---------- ---------- ---------- ---------- ---------------
         1 01-01-2012 10-01-2012 10-01-2012 01-02-2012          1 CONSECUTIVE    
         4 01-02-2012 10-02-2012 10-02-2012 01-03-2012          2 OVERLAP        
        10 01-03-2012 10-03-2012 10-03-2012 01-04-2012          3 OVERLAP        
         7 01-05-2012 10-05-2012 10-05-2012                     4 CONSECUTIVE    

We see that we have one range of consecutive dates for records classified under the CONSECUTIVE pattern, and one row for records with overlapping date ranges.

Overall, studying ranges due to the variety of possible cases is quite complicated. If you come across any case that doesn't fit into the above queries, please let me know in the comments. I'd be happy to investigate the topic further!

Komentarze