BigQuery:如何执行每天生成行的滚动时间戳窗口组计数

时间:2016-11-07 23:53:43

标签: sql google-bigquery

这是我在StackOverflow here上提出并解决的问题的扩展。

我是BigQuery和SQL新手,我想构建一个标准SQL查询,该查询将在X天的滚动时间窗口内对事件进行分组和计数。我的数据表如下所示:

event_id |    url    |          timestamp   
-----------------------------------------------------------
xx         a.html      2016-10-18 15:55:16 UTC
xx         a.html      2016-10-19 16:68:55 UTC
xx         a.html      2016-10-25 20:55:57 UTC
yy         b.html      2016-10-18 15:58:09 UTC
yy         a.html      2016-10-18 08:32:43 UTC
zz         a.html      2016-10-20 04:44:22 UTC
zz         c.html      2016-10-21 02:12:34 UTC

我正在跟踪网址上发生的事件。我想知道在X天的滚动时间段内每个事件发生了多少次。当我问这个问题时,我得到了一个很好的答案:

WITH dailyAggregations AS (
  SELECT 
    DATE(ts) AS day, 
    url, 
    event_id, 
    UNIX_SECONDS(TIMESTAMP(DATE(ts))) AS sec, 
    COUNT(1) AS events 
  FROM yourTable
  GROUP BY day, url, event_id, sec
)
SELECT 
  url, event_id, day, events, 
  SUM(events) 
    OVER(PARTITION BY url, event_id ORDER BY sec 
      RANGE BETWEEN 259200 PRECEDING AND CURRENT ROW
  ) AS rolling4daysEvents
FROM dailyAggregations

其中259200是3天秒(3x24x3600)。据我了解,此查询创建一个中间表,按日对事件进行分组和计数。它还将timestamp字段转换为其unix第二等效字段。然后,它使用以秒为单位测量的窗口来汇总事件。

现在,这将生成一个具有正确运行总计的表,但不保证每个日期,网址和事件都有一行。换句话说,如果在给定的URL上没有发生给定事件的日期,则结果表中将缺少日期。最重要的是,我可以修改上述查询(或构造一个不同的查询),以便为间隔中的每个日期正确生成rolling4daysEvents的值吗?例如:像定义为的区间:

SELECT *
  FROM UNNEST (GENERATE_DATE_ARRAY('2016-08-28', '2016-11-06')) AS day
  ORDER BY day ASC

谢谢!

1 个答案:

答案 0 :(得分:0)

WITH dailyAggregations AS (
  SELECT 
    DATE(ts) AS day, 
    url, 
    event_id, 
    UNIX_SECONDS(TIMESTAMP(DATE(ts))) AS sec, 
    COUNT(1) AS events 
  FROM yourTable
  GROUP BY day, url, event_id, sec
),
calendar AS (
  SELECT day
  FROM UNNEST (GENERATE_DATE_ARRAY('2016-08-28', '2016-11-06')) AS day
)
SELECT 
  c.day, url, event_id, events, 
  SUM(events) 
    OVER(PARTITION BY url, event_id ORDER BY sec 
      RANGE BETWEEN 259200 PRECEDING AND CURRENT ROW
  ) AS rolling4daysEvents
FROM calendar AS c
LEFT JOIN dailyAggregations AS a
ON a.day = c.day