使用PostgreSQL获取每日MAU滚动总和

时间:2017-08-25 17:54:35

标签: postgresql analytics

我将每日日志数据存储在Postgres数据库中,该数据库以id和date结构化。显然,如果多次登录,用户可以在数据库中拥有多行。

要想象:

| id   | timestamp           |
|------|---------------------|
| 0099 | 2004-10-19 10:23:54 |
| 1029 | 2004-10-01 10:23:54 |
| 2353 | 2004-10-20 8:23:54  |

假设MAU(“每月活跃用户数”)被定义为在给定日历月登录的唯一 ID的数量。我想在一个月内获得每天MAU的滚动总和,即MAU在不同时间点的增长。例如,如果我们查看2014年10月:

| date       | MAU   |
|------------|-------|
| 2014-10-01 | 10000 |
| 2014-10-02 | 12948 |
| 2014-10-03 | 13465 |

等到月底。我听说窗口函数可能是解决这个问题的一种方法。任何想法如何利用它来获得滚动的MAU总和?

2 个答案:

答案 0 :(得分:1)

阅读the documentation for Postgres window functions后,这是一个获得当月滚动MAU总和的解决方案:

-- First, get id and date of each timestamp within the current month
WITH raw_data as (SELECT id, date_trunc('day', timestamp) as timestamp
  FROM user_logs
  WHERE date_trunc('month', timestamp) = date_trunc('month', current_timestamp)),

-- Since we only want to count the earliest login for a month 
-- for a given login, use MIN() to aggregate 
month_data as (SELECT id, MIN(timestamp) as timestamp_day FROM raw_data GROUP BY id)

-- Postgres doesn't support DISTINCT for window functions, so query 
-- from the rolling sum to have each row as a day

SELECT timestamp_day as date, MAX(count) as MAU
  FROM (SELECT timestamp_day, COUNT(id) OVER(ORDER BY timestamp_day) FROM month_data) foo
  GROUP By timestamp_day

答案 1 :(得分:0)

对于给定的月份,您可以通过在用户看到月份的第一天添加用户来计算:

select date_trunc('day', mints), count(*) as usersOnDay,
       sum(count(*)) over (order by date_trunc('day', mints)) as cume_users
from (select id, min(timestamp) as mints
      from log
      where timestamp >= '2004-10-01'::date and timestamp < '2004-11-01'::date
      group by id
     ) l
group by date_trunc('day', mints);

注意:这回答了大约一个月的问题。这可以扩展到更多日历个月,您可以在第一天计算唯一身份用户,然后再添加增量。

如果您有一个累积期间超过月份边界的问题,请询问另一个问题并解释在这种情况下一个月的含义。

相关问题