时间序列“状态”计算

时间:2018-03-19 13:51:15

标签: sql amazon-redshift

示例数据:

rep_signup_date rep_id client_registration_date client_id 
1/2/2018        1      1/5/2018                 1          
1/2/2018        1      1/9/2018                 2
1/2/2018        1      2/15/2018                3
1/4/2018        2      2/3/2018                 4
1/4/2018        2      3/9/2018                 5
2/1/2018        3      2/2/2018                 6

我们对rep“status”进行分类的方式基于客户数量: 1个客户 - 状态1,2个客户 - 状态2,3个客户 - 状态3,所以在当前日期我们知道以下内容:

select rep_signup_date, rep_id,  
case when count(client_id) over (partition by rep_id) >=3 then '3'
     when count(client_id)  over (partition by rep_id) =2 then '2'
     when count(client_id)  over (partition by rep_id) =1 then '1'
     end status
from reps r
left join clients c on c.rep_id=r.id

rep_signup_date rep_id  status
1/2/2018        1       3     
1/4/2018        2       2
2/1/2018        3       1

但是,这些状态截至当前日期;我尝试添加date_trunc('month', client_registration_date)::date一个月,但它仍然根据最大日期将数据作为当前快照,而不是静态时间点。

我希望能够做到的是在每个月末获得状态 - 例如,1月底的rep id 1是状态2。

预期输出:

rep_signup_date rep_id month    status
1/2/2018        1      1/1/2018 2
1/2/2018        1      2/1/2018 3     
1/4/2018        2      2/1/2018 1
1/4/2018        2      3/1/2018 2
2/1/2018        3      2/1/2018 1

我怎样才能到达那里?谢谢。

1 个答案:

答案 0 :(得分:2)

使用order by

select rep_signup_date, rep_id,  
       (case when count(client_id) over (partition by rep_id order by client_registration_date rows between unbounded preceding and current row) >= 3 then '3'
             when count(client_id) over (partition by rep_id order by client_registration_date rows between unbounded preceding and current row) = 2 then '2'
             when count(client_id) over (partition by rep_id order by client_registration_date rows between unbounded preceding and current row) = 1 then '1'
        end) as status
from reps r left join
     clients c
     on c.rep_id = r.id;

每个客户端/代表似乎有一行,因此使用row_number()而不是累积计数更简单:

select rep_signup_date, rep_id,  
       (case when row_number() over (partition by rep_id order by client_registration_date ) >= 3 then '3'
             when row_number() over (partition by rep_id order by client_registration_date rows) = 2 then '2'
             when row_number() over (partition by rep_id order by client_registration_date = 1 then '1'
        end) as status
from reps r left join
     clients c
     on c.rep_id = r.id;

这可以进一步简化为:

select rep_signup_date, rep_id,  
       (case row_number() over (partition by rep_id order by client_registration_date ) >= 3
             when 1 then '1'
             when 2 then '2'
             else '3'
        end) as status
from reps r left join
     clients c
     on c.rep_id = r.id;

甚至:

select rep_signup_date, rep_id,  
       greatest(row_number() over (partition by rep_id order by client_registration_date ), 3) as status
from reps r left join
     clients c
     on c.rep_id = r.id;