计算特定时期的客户活动(例如:7天)

时间:2017-05-20 16:14:06

标签: sql amazon-redshift churn

我已设法计算客户是否在月度期间处于活动状态,并且在使用CTE的下一个期间(流失)中未激活。到目前为止,事实证明这是非常直接的。我的代码片段用于执行此操作(对于其他人如何执行此操作),如下所示。我的dwh.marts.fact_customer_kpi表有记录表示客户已经active,这意味着他/她已经花了一些钱使用服务。

with monthly_usage as (
  select
    userid as who_identifier,
    datediff(month, '1970-01-01', date) as time_period,
    date_part(mon,date) as month,
    date_part(yr,date) as year,
    CAST(
      CAST(date_part(yr,date) AS VARCHAR(4)) +
      RIGHT('0' + CAST(date_part(mon,date) AS VARCHAR(2)), 2) +
      RIGHT('0' + CAST(1 AS VARCHAR(2)), 2)
   AS DATETIME)as day
  from dwh.marts.fact_customer_kpi as k
          inner join dwh.marts.dim_user as u on u.user_id = k.userid
  where 
  kpi = 'ACTIVE' and (datediff(month, CURRENT_DATE, registration_date)*-1) > 1 group by 1,2,3,4,5 order by 1,2,3,4,5)
,

lag_lead as (
  select who_identifier,
  time_period,
  year,
  month,
  day,
    lag(time_period,1) over (partition by who_identifier order by who_identifier, time_period),
    lead(time_period,1) over (partition by who_identifier order by who_identifier, time_period)
  from monthly_usage)

,

lag_lead_with_diffs as (
  select who_identifier,
    year,
    month,
    day,
    time_period,
    lag,
    lead,
    time_period-lag lag_size,
    lead-time_period lead_size
  from lag_lead)
,

calculated as (
select time_period,
  year,
  month,
  day,
  case when lag is null then 'NEW ACTIVE'
     when lag_size = 1 then 'ACTIVE'
     when lag_size > 1 then 'REACTIVATED'
  end as this_month_value,
  case when (lead_size > 1 OR lead_size IS NULL) then 'CHURN'
     else NULL
  end as next_month_churn,
  who_identifier,
  count(who_identifier) as countIdentifier
   from lag_lead_with_diffs group by 1,2,3,4,5,6,7)

select time_period,
    day,
  this_month_value,
  who_identifier,
  next_month_churn,
  sum(countIdentifier) as countIdentifier
  from calculated  group by 1,2,3,4,5
union
  select time_period+1,
  dateadd(month,1,day),
  'CHURN',
  who_identifier,
  next_month_churn,
  countIdentifier
  from calculated where next_month_churn is not null
order by 1;

但是,现在我想知道Redshift中是否有一种有效的方法可以根据具体日期计算周期。例如,根据客户注册时的7天时间,计算上述相同值,而不是按月计算。

monthly_usage中需要我的查询中所需的更改。我尝试使用- interval '7 days'但到目前为止没有成功,或者我遗漏了一些东西。

有人能指出我缺少的东西(理想情况下是一个例子),或者需要进行哪些更改?

我正在使用Amazon Redshift。

1 个答案:

答案 0 :(得分:1)

您是否缺少date_trunc功能?因为感觉就像。

你可以替换它:

    CAST(
      CAST(date_part(yr,date) AS VARCHAR(4)) +
      RIGHT('0' + CAST(date_part(mon,date) AS VARCHAR(2)), 2) +
      RIGHT('0' + CAST(1 AS VARCHAR(2)), 2)
   AS DATETIME)as day

你可以这样做:

date_trunc('month', date)

然后我想用一些不错的语言对它进行参数化,并轻松换掉其他dateparts。 我可能还会为datediff(month, '1970-01-01', date)

换出EXTRACT