You are viewing boneist_oracle

 
 
13 April 2011 @ 10:38 pm
Oracle Tips and Tricks - Part 3: Tabibitosan  
This blog has moved - you can find this entry copied over here

Tabibitosan is a technique that I picked up from, guess where? Yup, the OTN Forums! It was Aketi Jyuuzou who first introduced me to the idea, and it's a very useful technique for a certain set of problems. Aketi's forum post has lots of information on it, but I just wanted to demonstrate a simple example here, to give you a taste of what you can use it for.

So, suppose we have a set of data, which is unique for id and dt:

        ID DT
---------- ----------
         1 01/01/2011
         1 02/01/2011
         1 03/01/2011
         1 04/01/2011
         1 05/01/2011
         1 10/01/2011
         1 11/01/2011
         1 12/01/2011
         1 14/01/2011
         1 15/01/2011
         1 16/01/2011
         2 01/01/2011
         2 02/01/2011
         2 03/01/2011
         2 04/01/2011
         2 06/01/2011
         2 07/01/2011
         2 11/01/2011
         2 12/01/2011
         2 13/01/2011
         2 28/01/2011
         2 30/01/2011
         2 31/01/2011
         2 01/02/2011


and you want to group the results to find the start and end dates of consecutive rows, eg:

        ID MIN_DT     MAX_DT    
---------- ---------- ----------
         1 01/01/2011 05/01/2011
         1 10/01/2011 12/01/2011
         1 14/01/2011 16/01/2011
         2 01/01/2011 04/01/2011
         2 06/01/2011 07/01/2011
         2 11/01/2011 13/01/2011
         2 28/01/2011 28/01/2011
         2 30/01/2011 01/02/2011


Where do we start? Well, the tabibitosan works by assigning a number to each of the rows in the resultset, either over the whole set or over the partitioned result sets - in our case, we'll be doing this per each id. Then it assigns a different, consecutive number to the rows.

Because we're using dates, we'll label each row by converting those to a number in Julian format. We'll also use the row_number() analytic function to label each row with a consecutive number for each id in ascending dt order. Finally, we'll subtract one from the other:

with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('28/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('30/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('31/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/02/2011', 'dd/mm/yyyy') dt from dual)
select id,
       dt,
       to_number(to_char(dt, 'j')) main_rn,
       row_number() over (partition by id order by dt) partitioned_rn,
       to_number(to_char(dt, 'j'))
         - row_number() over (partition by id order by dt) grp
from   sample_data;

        ID TO_CHAR(DT,'DD/MM/YYYY')    MAIN_RN PARTITIONED_RN        GRP
---------- ------------------------ ---------- -------------- ----------
         1 01/01/2011                  2455563              1    2455562
         1 02/01/2011                  2455564              2    2455562
         1 03/01/2011                  2455565              3    2455562
         1 04/01/2011                  2455566              4    2455562
         1 05/01/2011                  2455567              5    2455562
         1 10/01/2011                  2455572              6    2455566
         1 11/01/2011                  2455573              7    2455566
         1 12/01/2011                  2455574              8    2455566
         1 14/01/2011                  2455576              9    2455567
         1 15/01/2011                  2455577             10    2455567
         1 16/01/2011                  2455578             11    2455567
         2 01/01/2011                  2455563              1    2455562
         2 02/01/2011                  2455564              2    2455562
         2 03/01/2011                  2455565              3    2455562
         2 04/01/2011                  2455566              4    2455562
         2 06/01/2011                  2455568              5    2455563
         2 07/01/2011                  2455569              6    2455563
         2 11/01/2011                  2455573              7    2455566
         2 12/01/2011                  2455574              8    2455566
         2 13/01/2011                  2455575              9    2455566
         2 28/01/2011                  2455590             10    2455580
         2 30/01/2011                  2455592             11    2455581
         2 31/01/2011                  2455593             12    2455581
         2 01/02/2011                  2455594             13    2455581


You can see that we now have the same number (the grp column) for rows with consecutive dt's. Each group has a different grp value per each id. This is because as each dt increases by 1, so does the row number. The difference between the two values of that row and the previous row remains the same. As soon as the dt jumps by more than 1, the difference becomes greater, as the row number will only ever increase by 1.

Having identified the numbers that separate the rows into consecutive chunks, we can group the result set using them, to find the min and max dt's for each set:

with sample_data as (select 1 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('05/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('10/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('14/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('15/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 1 id, to_date('16/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('02/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('03/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('04/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('06/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('07/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('11/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('12/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('13/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('28/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('30/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('31/01/2011', 'dd/mm/yyyy') dt from dual union all
                     select 2 id, to_date('01/02/2011', 'dd/mm/yyyy') dt from dual),
     tabibitosan as (select id,
                            dt,
                            to_number(to_char(dt, j))
                              - row_number() over (partition by id order by dt) grp
                     from   sample_data)
select id,
       min(dt) min_dt,
       max(dt) max_dt
from   tabibitosan
group by id, grp
order by id, grp;

        ID MIN_DT     MAX_DT    
---------- ---------- ----------
         1 01/01/2011 05/01/2011
         1 10/01/2011 12/01/2011
         1 14/01/2011 16/01/2011
         2 01/01/2011 04/01/2011
         2 06/01/2011 07/01/2011
         2 11/01/2011 13/01/2011
         2 28/01/2011 28/01/2011
         2 30/01/2011 01/02/2011


Neat, huh?

Whenever I see this kind of "grouping consecutive rows together" problem, the tabibitosan method immediately leaps to mind. Thanks Aketi!

Also, note how I've used both analytic functions and subquery factoring in this - both powerful tools in their own right, you can use them as building blocks to come up with queries that may look fairly complicated, but are easy to break down to see what each constituent part is doing. Pack a few tools in your SQL toolkit and you can easily combine them to build complex queries.