How to check if a row is a continuation of the previous one - part 1



In database systems (although probably not only in databases) we often specify the beginning and end of something - record validity, employment start and end dates, invoice validity period. Examples could be multiplied. However, even though we specify the start_date and end_date of a single record, it may turn out that one record is a continuation of another.

For example, an employed employee worked in one team for 3 years and then, without a break in employment, he change the team. Despite the change of the team, he did not stop being an employee of the company.

An example of such data can be found in the HR schema in the HR.JOB_HISTORY table. For the sake of clarity of the example, I will display records for only two employees - 101 and 200.
 SELECT * FROM hr.job_history where employee_id in (101, 200) ;  
(I've added a third record for employee 101 with an employment date of 19-DEC-19 to make the example clearer)


INSERT INTO job_history 
VALUES (101, to_date('16-03-2005', 'dd-mm-yyyy', SYSDATE, 'AC_ACCOUNT', 110);
)

EMPLOYEE_IDSTART_DATEEND_DATEJOB_IDDEPARTMENT_ID
10121-SEP-9727-OCT-01AC_ACCOUNT110
10128-OCT-0115-MAR-05AC_MGR110
10116-MAR-0519-DEC-19AC_ACCOUNT110
20017-SEP-9517-JUN-01AD_ASST90
20001-JUL-0231-DEC-06AC_ACCOUNT90

We can see that employee 101 ended his job as AC_ACCOUNT, but without a break in employment, he started working in the same department as AC_MGR.

Employee 200, on the other hand, had a break in employment in the company for almost a year between the change of position.

We would like to obtain a result such that an employee who has not had a break in employment is displayed only once, and the start_date and end_date dates should be the dates of starting work in the department and ending work in the department, regardless of the position held.

So the expected result would look like this:

EMPLOYEE_IDSTART_DATEEND_DATE
10121-SEP-9715-MAR-05
20017-SEP-9517-JUN-01
20001-JUL-0231-DEC-06

To obtain such a result, we can use the analytical functions LAG and LEAD.

In the first step of building a query, I want to obtain information whether successive records are a continuation of each other:

I check if the start_date of the next record (LEAD) is equal to the end_date of the current record OR the end_date of the previous record (LAG) is equal to the start_date of the current record.

Why am I checking both dates?
Because the first record for an employee will not have a previous record and the last record will not have a next one. And ignoring these boundary records would yield wrong results. So our continuation query might look like this:

 select employee_id, start_date, end_date,  
   lead(start_date) over(partition by employee_id order by start_date) cont_start_date,   
   lag(end_date) over(partition by employee_id order by end_date) cont_end_date,  
   case when lead(start_date) over(partition by employee_id order by start_date) = end_date + 1   
      OR lag(end_date) over(partition by employee_id order by end_date) + 1 = start_date   
      then 0   
      else rownum  
   end if_continuation
 from hr.job_history where employee_id in (101, 200)  
 order by employee_id, start_date;   

If I verify that the record is a continuation, I put the value 0 in the if_continuation column, if there is no continuation - I put the value rownum. Rownum because I will be grouping after this field and I want the field to have the same value for records with a continuation, in my example 0, and for records without a continuation - each record should have different values, different from 0.

EMPLOYEE_IDSTART_DATEEND_DATEcont_START_DATEcont_END_DATEif_continuation 
10121-SEP-9727-OCT-0128-OCT-01- 0
10128-OCT-0115-MAR-0516-MAR-0527-OCT-010
10116-MAR-0519-DEC-19- 15-MAR-050
20017-SEP-9517-JUN-0101-JUL-02- 3
20001-JUL-0231-DEC-06- 17-JUN-014

Finally, it is enough to group by employee_id and whether the record is a continuation:

 select employee_id, min(start_date) start_date , max(end_date) end_date, count(*)  
 from (  
   select employee_id, start_date, end_date,  
     case when lead(start_date) over(partition by employee_id order by start_date) = end_date + 1   
      OR lag(end_date) over(partition by employee_id order by end_date) + 1 = start_date   
      then 0  
      else rownum  
     end continuity  
   from job_history where employee_id in ( 101,200))  
 group by employee_id, continuity;  

EMPLOYEE_IDSTART_DATEEND_DATECOUNT(*)
10121-SEP-9719-DEC-193
20017-SEP-9517-JUN-011
20001-JUL-0231-DEC-061

And done :)

Komentarze