为缺失的聚合创建空/默认行的最佳方法是什么

时间:2020-03-23 18:00:56

标签: sql

我有一个要分组为两个级别的表。作为输出,我需要所有分组值组合,以便在出现不存在的组合时以零结尾。例如,说我有这张桌子:

+------+------+
| user | page |
+------+------+
| a    |    1 |
| a    |    1 |
| a    |    2 |
| b    |    2 |
| b    |    3 |
+------+------+

我正在这样输出:

+------+------+--------+
| user | page | visits |
+------+------+--------+
| a    |    1 |      2 |
| a    |    2 |      1 |
| a    |    3 |      0 |
| b    |    1 |      0 |
| b    |    2 |      1 |
| b    |    3 |      1 |
+------+------+--------+

我可以通过以下查询来实现这一点,但是似乎比较费劲:

WITH 
    users AS (SELECT distinct(user) FROM sometable),
    pages AS (SELECT distinct(page) FROM sometable),
    users_pages_empty AS (SELECT * FROM users CROSS JOIN pages),
    users_pages_full AS (SELECT user, page, count(*) as visits FROM sometable GROUP BY user, page)
SELECT e.user, e.page, coalesce(f.visits, 0) as visits 
FROM users_pages_empty e 
LEFT JOIN users_pages_full f ON e.user=f.user AND e.page=f.page

我碰巧正在使用AWS Athena,但是我认为这比Athena问题更像是一个通用的SQL问题。

此查询的性能很好,这是我不满意的可读性/复杂性。

1 个答案:

答案 0 :(得分:2)

使用cross join生成行,使用left join引入现有行并进行汇总:

select u.user, p.page, count(s.user)
from (select distinct user from sometable) u cross join
     (select distinct page from sometable) p left join
     sometable s
     on s.user = u.user and s.page = p.page
group by u.user, p.page
order by u.user, p.page;