我有一个要分组为两个级别的表。作为输出,我需要所有分组值组合,以便在出现不存在的组合时以零结尾。例如,说我有这张桌子:
+------+------+
| user | page |
+------+------+
| a | 1 |
| a | 1 |
| a | 2 |
| b | 2 |
| b | 3 |
+------+------+
我正在这样输出:
+------+------+--------+
| user | page | visits |
+------+------+--------+
| a | 1 | 2 |
| a | 2 | 1 |
| a | 3 | 0 |
| b | 1 | 0 |
| b | 2 | 1 |
| b | 3 | 1 |
+------+------+--------+
我可以通过以下查询来实现这一点,但是似乎比较费劲:
WITH
users AS (SELECT distinct(user) FROM sometable),
pages AS (SELECT distinct(page) FROM sometable),
users_pages_empty AS (SELECT * FROM users CROSS JOIN pages),
users_pages_full AS (SELECT user, page, count(*) as visits FROM sometable GROUP BY user, page)
SELECT e.user, e.page, coalesce(f.visits, 0) as visits
FROM users_pages_empty e
LEFT JOIN users_pages_full f ON e.user=f.user AND e.page=f.page
我碰巧正在使用AWS Athena,但是我认为这比Athena问题更像是一个通用的SQL问题。
此查询的性能很好,这是我不满意的可读性/复杂性。
答案 0 :(得分:2)
使用cross join
生成行,使用left join
引入现有行并进行汇总:
select u.user, p.page, count(s.user)
from (select distinct user from sometable) u cross join
(select distinct page from sometable) p left join
sometable s
on s.user = u.user and s.page = p.page
group by u.user, p.page
order by u.user, p.page;