BigQuery运行总计

时间:2015-05-27 08:45:49

标签: google-bigquery window-functions running-total

我在BigQuery中遇到麻烦让运行总计为我工作。

我找到了一个适用于此的示例: BigQuery SQL running totals

SELECT word, word_count, SUM(word_count) OVER(ORDER BY word DESC)
FROM [publicdata:samples.shakespeare]
WHERE corpus  = 'hamlet'
AND word > 'a' LIMIT 30

但我真正想做的是计算覆盖总word_count 80%的最受欢迎单词的数量。所以我试着在word_count首先订购时计算运行总数:

SELECT word, word_count, SUM(word_count) OVER(ORDER BY word_count DESC)
FROM [publicdata:samples.shakespeare]
WHERE corpus  = 'hamlet'
AND word > 'a' LIMIT 30

但我明白了:

Row word    word_count  f0_  
1   o'er    18          18   
2   answer  13          31   
3   meet    8           39   
4   told    5           44   
5   treason 4           **52**   
6   quality 4           **52**   
7   brave   3           55  

运行总数不会从第5行增加到第6行。可能是因为在两种情况下word_count都是4.

我做错了什么?

也许有更好的方法?我的计划是计算跑步总数。然后除以sum(word_count)OVER()并仅过滤少于80%的行。然后计算这些行的数量。

1 个答案:

答案 0 :(得分:3)

首先,删除“LIMIT 30” - 它将干扰OVER()子句。

你想要一个比例?尝试RATIO_TO_REPORT:

SELECT word, word_count, RATIO_TO_REPORT(word_count) OVER(ORDER BY word_count DESC)
FROM [publicdata:samples.shakespeare]
WHERE corpus  = 'hamlet'
AND word > 'a' 

您是否希望具有相同值的连续行仍然增加?使用次要订单确定这些行的订单:

SELECT word, word_count, RATIO_TO_REPORT(word_count) OVER(ORDER BY word_count DESC, word)
FROM [publicdata:samples.shakespeare]
WHERE corpus  = 'hamlet'
AND word > 'a' 

你想要最受欢迎的单词覆盖80%吗?取这些比​​率,总结它们,然后过滤掉其余部分:

SELECT word, word_count, sum_ratio
FROM (
 SELECT word, word_count, SUM(ratio) OVER(ORDER BY ratio, word) sum_ratio
 FROM (
    SELECT word, word_count, RATIO_TO_REPORT(word_count) OVER(ORDER BY word_count DESC, word) ratio
    FROM [publicdata:samples.shakespeare]
    WHERE corpus  = 'hamlet'
    AND word > 'a' 
 )
)
WHERE sum_ratio>0.8

Row word    word_count  sum_ratio    
1   is      313         0.8125175752219499   
2   it      361         0.827019644076648    
3   in      400         0.8430884184308841   
4   my      441         0.8608042421564295   
5   you     499         0.8808500381633391   
6   of      630         0.906158357771261    
7   to      635         0.9316675370586108   
8   and     706         0.9600289237938375   
9   the     995         0.9999999999999999  
相关问题