我的表有很多列,我想计算每列的唯一值。我知道我能做到
SELECT sho_01, COUNT(*) from sho GROUP BY sho_01
UNION ALL
SELECT sho_02, COUNT(*) from sho GROUP BY sho_02
UNION ALL
....
这里sho
是表,sho_01
,....是单独的列。顺便说一下,这是BigQuery,因此他们使用UNION ALL
。
接下来,我想做同样的事情,但是对于sho
的一个子集,说SELECT * FROM sho WHERE id in (1,2,3)
。有没有一种方法可以先创建一个子表,然后查询该子表?像这样
SELECT * FROM (SELECT * FROM sho WHERE id IN (1,2,3)) AS t1;
SELECT sho_01, COUNT(*) from t1 GROUP BY sho_01
UNION ALL
SELECT sho_02, COUNT(*) from t1 GROUP BY sho_02
UNION ALL
....
谢谢
答案 0 :(得分:2)
大概,这些列都是相同的类型。如果是这样,您可以使用数组简化此操作:
select el.which, el.val, count(*)
from (select t1.*,
array[struct('sho_01' as which, sho_01 as val),
struct('sho_2', show_02),
. . .
] as ar
from t
) t cross join
unnest(ar) el
group by el.which, el.val;
然后,您可以通过在where
之前添加group by
子句来轻松过滤所需的内容。
答案 1 :(得分:1)
以下内容适用于BigQuery Standard SQL,可让您避免手动输入列名,甚至无需事先知道
#standardSQL
SELECT
TRIM(SPLIT(kv, ':')[OFFSET(0)], '"') column,
SPLIT(kv, ':')[OFFSET(1)] value,
COUNT(1) cnt
FROM `project.dataset.table` t,
UNNEST(SPLIT(TRIM(TO_JSON_STRING(t), '{}'))) kv
GROUP BY column, value
-- ORDER BY column, value