我在postgresql表中有一个事务列表,我需要根据它们发生的时间以及事务的运行总数是否超过阈值将它们分为几类。
此处的“同类群组”是指该月的最后一天以及是否已达到$ 100的阈值。
示例:当一批交易的金额大于等于$ 100时,“同类”在该月的最后一天变为“同类”
样本数据:
|TRANS_DATE|AMOUNT|
2018-01-01 | $10
2018-01-15 | $10
2018-01-30 | $50
2018-02-27 | $80
2018-03-05 | $101
2018-04-05 | $1
2018-05-15 | $80
2018-06-05 | $1
2018-07-26 | $18
鉴于此数据,我希望聚合查询的结果为:
DATE | AMOUNT | COHORT
2018-02-28 | $150 | 1
2018-03-31 | $101 | 2
2018-07-31 | $100 | 3
我一直认为我需要某种类型的循环来解决这个问题,我认为这是不可能的。
我一直在尝试类似的东西:
with st as
(
select distinct(date_trunc('month', "date") + interval '1 month' - interval '1 day') as date,
sum(amount) over (order by date_trunc('month', date) + interval '1 month' - interval '1 day') as total
from a1
order by 1
)
select st.*
, case when lag(total) over (order by date) <= 100 then 1 end as cohort1
, floor(total/100)
from st
答案 0 :(得分:1)
这很复杂。我很确定您需要递归CTE-因为您遇到了一个难题,然后重新开始。
尝试一下:
with tt as (
select date_trunc('mon', trans_date) as mon, sum(amount) as amount,
lead(sum(amount)) over (order by min(trans_date)) as next_amount,
row_number() over (order by min(trans_date)) as seqnum
from t
group by 1
),
cte as (
select mon, amount, seqnum, 1 as cohort, (amount >= 100) as is_new_cohort
from tt
where seqnum = 1
union all
select tt.mon,
(case when is_new_cohort then tt.amount else cte.amount + tt.amount end) as amount,
tt.seqnum,
(case when is_new_cohort then cohort + 1 else cohort end) as cohort,
( (case when is_new_cohort then tt.amount else cte.amount + tt.amount end) >= 100) as is_new_cohort
from cte join
tt
on tt.seqnum = cte.seqnum + 1
)
select cohort, max(amount), max(cte.mon + interval '1 month' - interval '1 day') as mon
from cte
group by 1
order by 1;
Here是db <>小提琴。