计算每个阶段之间的平均时间差

时间:2019-02-04 18:20:40

标签: oracle

如何计算每个阶段之间的平均时间差。

实际数据集面临的挑战是,并非每个ID都会经历所有阶段。有些会跳过阶段,并且日期对于所有Id来说都不是连续的,如下所示。

id    date        status
1     1/1/18      requirement
1     1/8/18      analysis
1     ?           design
1     1/30/18     closed
2     2/1/18      requirement
2     2/18/18     closed
3     1/2/18      requirement
3     1/29/18     analysis
3     ?           accepted 
3     2/5/18      closed

?-我们也缺少日期

Expected output

id    date        status      time_spent
1     1/1/18      requirement   0
1     1/8/18      analysis      7
1     ?           design       
1     1/30/18     closed        22
2     2/1/18      requirement   0
2     2/18/18     closed         17
3     1/2/18      requirement    0
3     1/29/18     analysis       27
3     ?           accepted       
3     2/5/18      closed         24      

status         avg(timespent)
requirement     0
analysis        17
design    
closed          21

3 个答案:

答案 0 :(得分:0)

您可以使用窗口函数LAG(或LEAD)来获取每个ID的前一个(或下一个)状态的数据。这样一来,您就可以计算出每个阶段所花费的时间。然后,计算每个阶段花费的平均时间。

以下是如何执行此操作的示例:

with input_data (id, dte, status) as (
SELECT 1, TO_DATE('1/1/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 1, TO_DATE('1/8/18','MM/DD/YY'), 'analysis' FROM DUAL UNION ALL
SELECT 1, NULL, 'design' FROM DUAL UNION ALL
SELECT 1, TO_DATE('1/30/18','MM/DD/YY'), 'closed' FROM DUAL UNION ALL
SELECT 2, TO_DATE('2/1/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 2, TO_DATE('2/18/18','MM/DD/YY'), 'closed' FROM DUAL UNION ALL
SELECT 3, TO_DATE('1/2/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 3, TO_DATE('1/29/18','MM/DD/YY'), 'analysis' FROM DUAL UNION ALL
SELECT 3, NULL, 'accepted' FROM DUAL UNION ALL
SELECT 3, TO_DATE('2/5/18','MM/DD/YY'), 'closed' FROM DUAL ),
----- Solution begins here
data_with_elapsed_days as (
SELECT id.*, dte-nvl(lag(dte ignore nulls) over ( partition by id order by dte ), dte) elapsed
from input_data id)
SELECT status, avg(elapsed)
FROM data_with_elapsed_days d
group by status
order by decode(status,'requirement',1,'analysis',2,'design',3,'accepted',4,'closed',5,99);


+-------------+-------------------------------------------+
|   STATUS    |               AVG(ELAPSED)                |
+-------------+-------------------------------------------+
| requirement |                                         0 |
| analysis    |                                        17 |
| design      |                                           |
| accepted    |                                           |
| closed      | 15.33333333333333333333333333333333333333 |
+-------------+-------------------------------------------+

正如我在评论中所说,该逻辑将经过的天数计算为从先前状态到给定状态的时间。由于“需求”没有优先状态,因此此逻辑将始终显示零天的需求。计算从给定状态到 next 状态的时间可能会更好。对于“关闭”,将没有下一个状态。您可以将其保留为空白,也可以使用SYSDATE作为下一个状态的数据。这是一个示例:

with input_data (id, dte, status) as (
SELECT 1, TO_DATE('1/1/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 1, TO_DATE('1/8/18','MM/DD/YY'), 'analysis' FROM DUAL UNION ALL
SELECT 1, NULL, 'design' FROM DUAL UNION ALL
SELECT 1, TO_DATE('1/30/18','MM/DD/YY'), 'closed' FROM DUAL UNION ALL
SELECT 2, TO_DATE('2/1/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 2, TO_DATE('2/18/18','MM/DD/YY'), 'closed' FROM DUAL UNION ALL
SELECT 3, TO_DATE('1/2/18','MM/DD/YY'), 'requirement' FROM DUAL UNION ALL
SELECT 3, TO_DATE('1/29/18','MM/DD/YY'), 'analysis' FROM DUAL UNION ALL
SELECT 3, NULL, 'accepted' FROM DUAL UNION ALL
SELECT 3, TO_DATE('2/5/18','MM/DD/YY'), 'closed' FROM DUAL ),
----- Solution begins here
data_with_elapsed_days as (
SELECT id.*, nvl(lead(dte ignore nulls) over ( partition by id order by dte ), trunc(sysdate))-dte elapsed
from input_data id)
SELECT status, avg(elapsed)
FROM data_with_elapsed_days d
group by status
order by decode(status,'requirement',1,'analysis',2,'design',3,'accepted',4,'closed',5,99);



+-------------+------------------------------------------+
|   STATUS    |               AVG(ELAPSED)               |
+-------------+------------------------------------------+
| requirement |                                       17 |
| analysis    |                                     14.5 |
| design      |                                          |
| accepted    |                                          |
| closed      | 361.666666666666666666666666666666666667 |
+-------------+------------------------------------------+

答案 1 :(得分:0)

我同意@MatthewMcPeak。您的要求似乎有些奇怪:您在requirement阶段花费了零天,但在closed上平均花费了21天?弗洛德。

此解决方案将显示的日期视为该阶段的开始日期,并计算该日期与下一阶段的开始日期之差。

with cte as (
    select status
           , lead(dd ignore nulls) over (partition by id order by dd) - dd as dt_diff
    from your_table)
select status, avg(dt_diff) as avg_ela
from cte
group by status
/

答案 2 :(得分:0)

如果您希望包括每个d的所有阶段并估计每个阶段所花费的时间(使用线性插值),则可以创建一个具有所有状态的子查询,并使用PARTITION OUTER JOIN来进行加入他们,然后使用LAGLEAD查找状态所在的日期范围并在其之间进行插值:

Oracle设置

CREATE TABLE data ( d, dt, status ) AS
SELECT 1, TO_DATE( '1/1/18', 'MM/DD/YY' ),  'requirement' FROM DUAL UNION ALL
SELECT 1, TO_DATE( '1/8/18', 'MM/DD/YY' ),  'analysis'    FROM DUAL UNION ALL
SELECT 1, NULL,                             'design'      FROM DUAL UNION ALL
SELECT 1, TO_DATE( '1/30/18', 'MM/DD/YY' ), 'closed'      FROM DUAL UNION ALL
SELECT 2, TO_DATE( '2/1/18', 'MM/DD/YY' ),  'requirement' FROM DUAL UNION ALL
SELECT 2, TO_DATE( '2/18/18', 'MM/DD/YY' ), 'closed'      FROM DUAL UNION ALL
SELECT 3, TO_DATE( '1/2/18', 'MM/DD/YY' ),  'requirement' FROM DUAL UNION ALL
SELECT 3, TO_DATE( '1/29/18', 'MM/DD/YY' ), 'analysis'    FROM DUAL UNION ALL
SELECT 3, NULL,                             'accepted'    FROM DUAL UNION ALL
SELECT 3, TO_DATE( '2/5/18', 'MM/DD/YY' ),  'closed'      FROM DUAL;

查询

WITH statuses ( status, id ) AS (
  SELECT 'requirement', 1 FROM DUAL UNION ALL
  SELECT 'analysis',    2 FROM DUAL UNION ALL
  SELECT 'design',      3 FROM DUAL UNION ALL
  SELECT 'accepted',    4 FROM DUAL UNION ALL
  SELECT 'closed',      5 FROM DUAL
),
ranges ( d, dt, status, id, recent_dt, recent_id, next_dt, next_id ) AS (
  SELECT d.d,
         d.dt,
         s.status,
         s.id,
         NVL(
           d.dt,
           LAG( d.dt, 1 )
             IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
         ),
         NVL2(
           d.dt,
           s.id,
           LAG( CASE WHEN d.dt IS NOT NULL THEN s.id END, 1 )
             IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
         ),
         LEAD( d.dt, 1, d.dt )
           IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id ),
         LEAD( CASE WHEN d.dt IS NOT NULL THEN s.id END, 1, s.id + 1 )
           IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
  FROM   data d
         PARTITION BY ( d )
         RIGHT OUTER JOIN statuses s
         ON ( d.status = s.status )
)
SELECT d,
       dt,
       status,
       ( next_dt - recent_dt ) / (next_id - recent_id ) AS estimated_duration
FROM   ranges;

输出

 D | DT        | STATUS      |                       ESTIMATED_DURATION
-: | :-------- | :---------- | ---------------------------------------:
 1 | 01-JAN-18 | requirement |                                        7
 1 | 08-JAN-18 | analysis    | 7.33333333333333333333333333333333333333
 1 | null      | design      | 7.33333333333333333333333333333333333333
 1 | null      | accepted    | 7.33333333333333333333333333333333333333
 1 | 30-JAN-18 | closed      |                                        0
 2 | 01-FEB-18 | requirement |                                     4.25
 2 | null      | analysis    |                                     4.25
 2 | null      | design      |                                     4.25
 2 | null      | accepted    |                                     4.25
 2 | 18-FEB-18 | closed      |                                        0
 3 | 02-JAN-18 | requirement |                                       27
 3 | 29-JAN-18 | analysis    | 2.33333333333333333333333333333333333333
 3 | null      | design      | 2.33333333333333333333333333333333333333
 3 | null      | accepted    | 2.33333333333333333333333333333333333333
 3 | 05-FEB-18 | closed      |                                        0

查询2

然后您可以轻松地将其更改为每种状态的平均值:

WITH statuses ( status, id ) AS (
  SELECT 'requirement', 1 FROM DUAL UNION ALL
  SELECT 'analysis',    2 FROM DUAL UNION ALL
  SELECT 'design',      3 FROM DUAL UNION ALL
  SELECT 'accepted',    4 FROM DUAL UNION ALL
  SELECT 'closed',      5 FROM DUAL
),
ranges ( d, dt, status, id, recent_dt, recent_id, next_dt, next_id ) AS (
  SELECT d.d,
         d.dt,
         s.status,
         s.id,
         NVL(
           d.dt,
           LAG( d.dt, 1 )
             IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
         ),
         NVL2(
           d.dt,
           s.id,
           LAG( CASE WHEN d.dt IS NOT NULL THEN s.id END, 1 )
             IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
         ),
         LEAD( d.dt, 1, d.dt )
           IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id ),
         LEAD( CASE WHEN d.dt IS NOT NULL THEN s.id END, 1, s.id + 1 )
           IGNORE NULLS OVER ( PARTITION BY d.d ORDER BY s.id )
  FROM   data d
         PARTITION BY ( d )
         RIGHT OUTER JOIN statuses s
         ON ( d.status = s.status )
)
SELECT status,
       AVG( ( next_dt - recent_dt ) / (next_id - recent_id ) ) AS estimated_duration
FROM   ranges
GROUP BY status, id
ORDER BY id;

结果

STATUS      |                       ESTIMATED_DURATION
:---------- | ---------------------------------------:
requirement |                                    12.75
analysis    | 4.63888888888888888888888888888888888889
design      | 4.63888888888888888888888888888888888889
accepted    | 4.63888888888888888888888888888888888889
closed      |                                        0

db <>提琴here