按行之间的差异将日期分组

时间:2019-02-28 21:31:22

标签: sql oracle

我正在尝试仅使用一个使用行之间日期的查询对一些数据进行分组。让我举个例子:

数据

IDE     DATE
------  ----------
AA1111  23-05-2016
AA1111  25-05-2016
AA1111  25-05-2016
AA1111  13-09-2016
AA1111  02-11-2016
AA1111  23-11-2016
AA1111  06-02-2017
AA1111  06-06-2017
AA1111  01-09-2017
AA1111  12-10-2017
AA1111  17-04-2018
AA1111  25-05-2018
AA1111  05-06-2018

我想将差异少于16天的日期分组。我已经使用以下方法计算了日期和下一个日期之间的差额:

SELECT  T.IDE, 
        T.DATE, 
        MAX(T.DATE) OVER (ORDER BY DATE ROWS BETWEEN CURRENT ROW AND 1 FOLLOWING ) - T.DATE AS DIF 

    FROM TESTPAT1 T ;

输出1

IDE     DATE        DIF
------  ----------  ---
AA1111  23-05-2016  2
AA1111  25-05-2016  0
AA1111  25-05-2016  111
AA1111  13-09-2016  50
AA1111  02-11-2016  21
AA1111  23-11-2016  75
AA1111  06-02-2017  120
AA1111  06-06-2017  87
AA1111  01-09-2017  41
AA1111  12-10-2017  187
AA1111  17-04-2018  38
AA1111  25-05-2018  11
AA1111  05-06-2018  0

在这里,我可以使用行之间的差异,但是16天的窗口是我的问题,因为组中的每个日期都必须从窗口的第一个日期开始在该窗口内。

一些注意事项:日期按升序排序,我的预期输出为:

预期输出

IDE     DATE        GROUP

AA1111  23-05-2016  1
AA1111  25-05-2016  1
AA1111  25-05-2016  1
AA1111  13-09-2016  2
AA1111  02-11-2016  3
AA1111  23-11-2016  4
AA1111  06-02-2017  5
AA1111  06-06-2017  6
AA1111  01-09-2017  7
AA1111  12-10-2017  8
AA1111  17-04-2018  9
AA1111  25-05-2018  10
AA1111  05-06-2018  10

注意这不是实际的变量名称

2 个答案:

答案 0 :(得分:1)

查看上一行。查看日期差是否大于或等于16天。如果是,它将启动一个新组。然后,组标识符就是这些“起始组”值的总和。

在SQL中:

select t.*,
       sum(case when prev_date > date - interval '16' day then 0 else 1 end) over (partition by ide order by date) as grp
from (select t.*, 
             lag(date) over (partition by ide order by date) as prev_date
      from TESTPAT1 T
     ) t;

注意:这假设您实际上希望每个ide都有单独的组。如果不是这种情况,则删除partition by子句。

答案 1 :(得分:1)

这就是所谓的“位拟合”问题。在您的情况下,您正在尝试将数据适合到每个组中,每个组最多可容纳16天的数据。

有几种使用SQL来解决bin拟合问题的著名方法。 MATCH RECOGNIZE和其中任何一个都一样好:

with test_data (IDE,     "DATE") AS (
SELECT 'AA1111',  TO_DATE('23-05-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('25-05-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('25-05-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('13-09-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('02-11-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('23-11-2016','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('06-02-2017','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('06-06-2017','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('01-09-2017','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('12-10-2017','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('17-04-2018','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('25-05-2018','DD-MM-YYYY') FROM DUAL UNION ALL
SELECT 'AA1111',  TO_DATE('05-06-2018','DD-MM-YYYY') FROM DUAL )
SELECT ide, "DATE", mno as "GROUP"
FROM test_data
match_recognize (
  partition by ide
  order by "DATE"
  measures 
    match_number() as mno,
    "DATE" - FIRST(GRP."DATE") as dif
    all rows per match
    pattern (  grp* )
    define 
      grp AS "DATE" - FIRST("DATE") < 16
  );

结果

+--------+-----------+-------+
|  IDE   |   DATE    | GROUP |
+--------+-----------+-------+
| AA1111 | 23-MAY-16 |     1 |
| AA1111 | 25-MAY-16 |     1 |
| AA1111 | 25-MAY-16 |     1 |
| AA1111 | 13-SEP-16 |     2 |
| AA1111 | 02-NOV-16 |     3 |
| AA1111 | 23-NOV-16 |     4 |
| AA1111 | 06-FEB-17 |     5 |
| AA1111 | 06-JUN-17 |     6 |
| AA1111 | 01-SEP-17 |     7 |
| AA1111 | 12-OCT-17 |     8 |
| AA1111 | 17-APR-18 |     9 |
| AA1111 | 25-MAY-18 |    10 |
| AA1111 | 05-JUN-18 |    10 |
+--------+-----------+-------+

使用MODEL子句为11g用户更新

此查询应在11g上工作以解决您的垃圾箱拟合问题。与上述结果相同,只是方法不同。

with 
  -- First, sort the input data because we need to be able to refer
  -- to the prior row and `lag` doesn't really work in `MODEL`, afaik.
sorted_inputs ( ide, sort_order, "DATE", first_date_in_group, grp, diff) as
( SELECT ide, 
         row_number() over ( partition by ide order by "DATE" ) sort_order, 
         "DATE", 
         -- These columns are place holders for the MODEL clause to update
         CAST(NULL AS DATE) first_date_in_group, 
         0 grp, 
         0 diff 
  FROM   test_data )
SELECT  ide, "DATE", grp "GROUP"
from    sorted_inputs
model 
partition by (ide)
dimension by (sort_order)
measures ( "DATE", grp, first_date_in_group, diff )
rules update automatic order
( grp[1] = 1,
  first_date_in_group[1] = "DATE"[1],
  diff[ANY] = "DATE"[CV()] - first_date_in_group[CV()-1],
  grp[sort_order>1] = grp[cv()-1] + CASE WHEN diff[CV()] > 16 THEN 1 ELSE 0 END,
  first_date_in_group[sort_order>1] = CASE WHEN diff[CV()] > 16 THEN "DATE"[CV()] ELSE first_date_in_group[CV()-1] END
)