CREATE TABLE dates_test ( id NUMBER, start_date DATE, end_date DATE, text varchar2(100) );
begin delete from dates_test; -- consecutive rows INSERT INTO dates_test VALUES (1, TO_DATE('20120101','YYYYMMDD'), TO_DATE('20120105','YYYYMMDD'), 'consecutive'); INSERT INTO dates_test VALUES (2, TO_DATE('20120106','YYYYMMDD'), TO_DATE('20120108','YYYYMMDD'), 'consecutive'); INSERT INTO dates_test VALUES (3, TO_DATE('20120109','YYYYMMDD'), TO_DATE('20120110','YYYYMMDD'), 'consecutive'); -- normal rows INSERT INTO dates_test VALUES (90, TO_DATE('20120401','YYYYMMDD'), TO_DATE('20120404','YYYYMMDD'), 'normal'); INSERT INTO dates_test VALUES (91, TO_DATE('20120406','YYYYMMDD'), TO_DATE('20120407','YYYYMMDD'), 'normal'); COMMIT; end; /
ID START_DATE END_DATE TEXT ----- ---------- ---------- --------------- 1 01-01-2012 05-01-2012 consecutive 2 06-01-2012 08-01-2012 consecutive 3 09-01-2012 10-01-2012 consecutive 90 01-04-2012 04-04-2012 normal 91 06-04-2012 07-04-2012 normal
We can see that the start date of the rows is 1 day greater than the end date of the previous row.
How can we find such rows in the table and check if the date ranges of the rows overlap? In a previous post, I showed a method using MATCH_RECOGNIZE and the ALWAYS TRUE variable. This time, we will try a different method – in the pattern definition, I will compare not only the start date to the end date of the previous row as before but also the end date to the start date of the NEXT row. In this version, we do not need to use the ALWAYS TRUE variable.
The most important thing here is to sort the records correctly – I sort by the start date and then by the end date. In the pattern definition, we specify the verification as described:
- end date of the current row + 1 = start date of the next row
- start date of the current row – 1 = end date of the previous row
NOTE! For clarity, queries here do not take into account end dates with null values, which are often found in date ranges. If you have such a case, simply add the NVL function to the END_DATE column and set a distant date, e.g., 12.12.2999.
SELECT * FROM dates_test MATCH_RECOGNIZE ( ORDER BY start_date , end_Date nulls last MEASURES final first(id) as fid, first(start_date ) as sd, final last(end_date ) as ed, max(end_date) as med, next(start_Date) nsd, MATCH_NUMBER() AS mno, CLASSIFIER() AS cls ALL ROWS PER MATCH with unmatched rows PATTERN (consecutive+) DEFINE consecutive as consecutive.end_date + 1 = next(consecutive.start_date) or consecutive.start_date - 1 = prev(consecutive.end_date)) order by start_date, end_date;
START_DATE END_DATE FID SD ED MED NSD MNO CLS ID TEXT ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- --------------- ----- --------------- 01-01-2012 05-01-2012 1 01-01-2012 10-01-2012 05-01-2012 06-01-2012 1 CONSECUTIVE 1 consecutive 06-01-2012 08-01-2012 1 01-01-2012 10-01-2012 08-01-2012 09-01-2012 1 CONSECUTIVE 2 consecutive 09-01-2012 10-01-2012 1 01-01-2012 10-01-2012 10-01-2012 01-04-2012 1 CONSECUTIVE 3 consecutive 01-04-2012 04-04-2012 90 normal 06-04-2012 07-04-2012 91 normal
We see that rows with consecutive date ranges have been classified under one pattern (column MNO = 1 for all records).
The two control rows with IDs 90 and 91 were not classified under the pattern.
We can also merge consecutive records into a single row.SELECT * FROM dates_test MATCH_RECOGNIZE ( ORDER BY start_date , end_Date nulls last MEASURES final first(id) as fid, first(start_date ) as sd, final last(end_date ) as ed, max(end_date) as med, next(start_Date) nsd, MATCH_NUMBER() AS mno, CLASSIFIER() AS cls ONE ROW PER MATCH PATTERN (consecutive+) DEFINE consecutive as consecutive.end_date + 1 = next(consecutive.start_date) or consecutive.start_date - 1 = prev(consecutive.end_date)) ;
FID SD ED MED NSD MNO CLS ---------- ---------- ---------- ---------- ---------- ---------- --------------- 1 01-01-2012 10-01-2012 10-01-2012 01-04-2012 1 CONSECUTIVE
The start date of our set of test rows is 01.01.2012 and the end date is 10.01.2012. Everything matches!
The second case is equally interesting. We will want to search the table and find rows where the date ranges overlap. For consecutive rows, the end date of the current row must be one day less than the start date of the next row. In this case, we need to find rows that meet the following condition:
- The start date of the current row must be less than or equal to the end date of the next row.
Again, sorting is crucial here. We sort the rows by the start date first and then by the end date.
Let's load the appropriate data where the date ranges overlap. Additionally, we will insert two rows with different date ranges, just to be sure.
begin delete from dates_test; -- overlapping rows INSERT INTO dates_test VALUES (4, TO_DATE('20120201','YYYYMMDD'), TO_DATE('20120207','YYYYMMDD'), 'overlapping'); INSERT INTO dates_test VALUES (5, TO_DATE('20120206','YYYYMMDD'), TO_DATE('20120209','YYYYMMDD'), 'overlapping'); INSERT INTO dates_test VALUES (6, TO_DATE('20120208','YYYYMMDD'), TO_DATE('20120210','YYYYMMDD'), 'overlapping'); -- normal rows INSERT INTO dates_test VALUES (90, TO_DATE('20120401','YYYYMMDD'), TO_DATE('20120404','YYYYMMDD'), 'normal'); INSERT INTO dates_test VALUES (91, TO_DATE('20120406','YYYYMMDD'), TO_DATE('20120407','YYYYMMDD'), 'normal'); COMMIT; end; /
ID START_DATE END_DATE TEXT ----- ---------- ---------- --------------- 4 01-02-2012 07-02-2012 overlapping 5 06-02-2012 09-02-2012 overlapping 6 08-02-2012 10-02-2012 overlapping 90 01-04-2012 04-04-2012 normal 91 06-04-2012 07-04-2012 normal
For row with ID 4, we have overlapping date ranges: the end date of the row with ID = 4 is 07.02.2012, while the start date of the next row, row with ID = 5, is earlier than the end date of the previous row. The date ranges overlap.
Let's try to find rows with overlapping date ranges using a query.
SELECT * FROM dates_test MATCH_RECOGNIZE ( ORDER BY start_date , end_Date nulls last MEASURES final first(id) as fid, first(start_date ) as sd, final last(end_date ) as ed, max(end_date) as med, next(start_Date) nsd, MATCH_NUMBER() AS mno, CLASSIFIER() AS cls ALL ROWS PER MATCH with unmatched rows PATTERN (str overlap+) DEFINE overlap as overlap.start_date <= prev(overlap.end_date) ) ;
In the pattern definition, we declared a condition that we are looking for rows where the start date of the current row is less than or equal to the end date of the previous row. In the pattern, we use the ALWAYS TRUE variable (str) – a "virtual" reference row for the first row in the set.
Let's check the results
START_DATE END_DATE FID SD ED MED NSD MNO CLS ID TEXT ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- --------------- ----- --------------- 01-02-2012 07-02-2012 4 01-02-2012 10-02-2012 07-02-2012 06-02-2012 1 STR 4 overlapping 06-02-2012 09-02-2012 4 01-02-2012 10-02-2012 09-02-2012 08-02-2012 1 OVERLAP 5 overlapping 08-02-2012 10-02-2012 4 01-02-2012 10-02-2012 10-02-2012 01-04-2012 1 OVERLAP 6 overlapping 01-04-2012 04-04-2012 90 normal 06-04-2012 07-04-2012 91 normal
Looks good. The start date (SD) and end date (ED) for the entire pattern are the same. All rows have been classified under one pattern MNO = 1 for rows with IDs 4, 5, 6.
Let's collapse this into one row.
SELECT * FROM dates_test MATCH_RECOGNIZE ( ORDER BY start_date , end_Date nulls last MEASURES final first(id) as fid, first(start_date ) as sd, final last(end_date ) as ed, max(end_date) as med, next(start_Date) nsd, MATCH_NUMBER() AS mno, CLASSIFIER() AS cls one row per match PATTERN (str overlap+ ) DEFINE overlap as overlap.start_date <= prev(overlap.end_date) ) ;
FID SD ED MED NSD MNO CLS ---------- ---------- ---------- ---------- ---------- ---------- --------------- 4 01-02-2012 10-02-2012 10-02-2012 01-04-2012 1 OVERLAP
What if we combine the test data from case 1 and case 2, and we want to find rows that have consecutive date ranges as well as rows with overlapping date ranges? Will our queries still work correctly?
Let's prepare the test data:
- 3 rows with consecutive date ranges
- 3 rows with overlapping date ranges
- 2 rows with different date ranges for comparison
begin delete from dates_test; -- consecutive rows INSERT INTO dates_test VALUES (1, TO_DATE('20120101','YYYYMMDD'), TO_DATE('20120105','YYYYMMDD'), 'consecutive'); INSERT INTO dates_test VALUES (2, TO_DATE('20120106','YYYYMMDD'), TO_DATE('20120108','YYYYMMDD'), 'consecutive'); INSERT INTO dates_test VALUES (3, TO_DATE('20120109','YYYYMMDD'), TO_DATE('20120110','YYYYMMDD'), 'consecutive'); -- overlapping rows INSERT INTO dates_test VALUES (4, TO_DATE('20120201','YYYYMMDD'), TO_DATE('20120207','YYYYMMDD'), 'overlapping'); INSERT INTO dates_test VALUES (5, TO_DATE('20120206','YYYYMMDD'), TO_DATE('20120209','YYYYMMDD'), 'overlapping'); INSERT INTO dates_test VALUES (6, TO_DATE('20120208','YYYYMMDD'), TO_DATE('20120210','YYYYMMDD'), 'overlapping'); -- normal rows INSERT INTO dates_test VALUES (90, TO_DATE('20120401','YYYYMMDD'), TO_DATE('20120404','YYYYMMDD'), 'normal'); INSERT INTO dates_test VALUES (91, TO_DATE('20120406','YYYYMMDD'), TO_DATE('20120407','YYYYMMDD'), 'normal'); INSERT INTO dates_test VALUES (7, TO_DATE('20120501','YYYYMMDD'), TO_DATE('20120505','YYYYMMDD'), 'consecutive'); INSERT INTO dates_test VALUES (8, TO_DATE('20120506','YYYYMMDD'), TO_DATE('20120508','YYYYMMDD'), 'consecutive'); INSERT INTO dates_test VALUES (9, TO_DATE('20120509','YYYYMMDD'), TO_DATE('20120510','YYYYMMDD'), 'consecutive'); -- overlapping rows INSERT INTO dates_test VALUES (10, TO_DATE('20120301','YYYYMMDD'), TO_DATE('20120307','YYYYMMDD'), 'overlapping'); INSERT INTO dates_test VALUES (11, TO_DATE('20120306','YYYYMMDD'), TO_DATE('20120309','YYYYMMDD'), 'overlapping'); INSERT INTO dates_test VALUES (12, TO_DATE('20120308','YYYYMMDD'), TO_DATE('20120310','YYYYMMDD'), 'overlapping'); COMMIT; end; /
SELECT * FROM dates_test MATCH_RECOGNIZE ( ORDER BY start_date , end_Date nulls last MEASURES final first(id) as fid, first(start_date ) as sd, final last(end_date ) as ed, max(end_date) as med, next(start_Date) nsd, MATCH_NUMBER() AS mno, CLASSIFIER() AS cls ALL ROWS PER MATCH with unmatched rows PATTERN ((str overlap+ )| consecutive+ ) DEFINE consecutive as consecutive.end_date + 1 = next(consecutive.start_date) or consecutive.start_date - 1 = prev(consecutive.end_date), overlap as overlap.start_date <= prev(overlap.end_date) ) order by mno, id;
Now we have defined two patterns: CONSECUTIVE and OVERLAP.
In the PATTERN clause, we start by checking the OVERLAP pattern along with the ALWAYS TRUE variable. Then, we combine this pattern using the OR operator | with the CONSECUTIVE pattern.
Let's check the results.
START_DATE END_DATE FID SD ED MED NSD MNO CLS ID TEXT ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- --------------- ----- --------------- 01-01-2012 05-01-2012 1 01-01-2012 10-01-2012 05-01-2012 06-01-2012 1 CONSECUTIVE 1 consecutive 06-01-2012 08-01-2012 1 01-01-2012 10-01-2012 08-01-2012 09-01-2012 1 CONSECUTIVE 2 consecutive 09-01-2012 10-01-2012 1 01-01-2012 10-01-2012 10-01-2012 01-02-2012 1 CONSECUTIVE 3 consecutive 01-02-2012 07-02-2012 4 01-02-2012 10-02-2012 07-02-2012 06-02-2012 2 STR 4 overlapping 06-02-2012 09-02-2012 4 01-02-2012 10-02-2012 09-02-2012 08-02-2012 2 OVERLAP 5 overlapping 08-02-2012 10-02-2012 4 01-02-2012 10-02-2012 10-02-2012 01-03-2012 2 OVERLAP 6 overlapping 01-03-2012 07-03-2012 10 01-03-2012 10-03-2012 07-03-2012 06-03-2012 3 STR 10 overlapping 06-03-2012 09-03-2012 10 01-03-2012 10-03-2012 09-03-2012 08-03-2012 3 OVERLAP 11 overlapping 08-03-2012 10-03-2012 10 01-03-2012 10-03-2012 10-03-2012 01-04-2012 3 OVERLAP 12 overlapping 01-05-2012 05-05-2012 7 01-05-2012 10-05-2012 05-05-2012 06-05-2012 4 CONSECUTIVE 7 consecutive 06-05-2012 08-05-2012 7 01-05-2012 10-05-2012 08-05-2012 09-05-2012 4 CONSECUTIVE 8 consecutive 09-05-2012 10-05-2012 7 01-05-2012 10-05-2012 10-05-2012 4 CONSECUTIVE 9 consecutive 01-04-2012 04-04-2012 90 normal 06-04-2012 07-04-2012 91 normal
Rows with IDs 1-3 and 7-8 have been correctly classified under the CONSECUTIVE pattern, rows 4-6 and 10-12 have been correctly classified under the OVERLAP pattern, while the comparison rows, 90 and 91, have not been classified under any pattern.
Let's condense the results to one row per pattern.
SELECT * FROM dates_test MATCH_RECOGNIZE ( ORDER BY start_date , end_Date nulls last MEASURES final first(id) as fid, first(start_date ) as sd, final last(end_date ) as ed, max(end_date) as med, next(start_Date) nsd, MATCH_NUMBER() AS mno, CLASSIFIER() AS cls one row per match PATTERN ((str overlap+ )|consecutive+) DEFINE consecutive as consecutive.end_date + 1 = next(consecutive.start_date) or consecutive.start_date - 1 = prev(consecutive.end_date), overlap as overlap.start_date <= prev(overlap.end_date) ) order by sd;
FID SD ED MED NSD MNO CLS ---------- ---------- ---------- ---------- ---------- ---------- --------------- 1 01-01-2012 10-01-2012 10-01-2012 01-02-2012 1 CONSECUTIVE 4 01-02-2012 10-02-2012 10-02-2012 01-03-2012 2 OVERLAP 10 01-03-2012 10-03-2012 10-03-2012 01-04-2012 3 OVERLAP 7 01-05-2012 10-05-2012 10-05-2012 4 CONSECUTIVE
We see that we have one range of consecutive dates for records classified under the CONSECUTIVE pattern, and one row for records with overlapping date ranges.
Overall, studying ranges due to the variety of possible cases is quite complicated. If you come across any case that doesn't fit into the above queries, please let me know in the comments. I'd be happy to investigate the topic further!
Komentarze
Prześlij komentarz