我有一张桌子:
key product_code cost
1 UK 20
1 US 10
1 EU 5
2 UK 3
2 EU 6
我想找到每组" key"的所有产品的总和。并附加到每一行。例如,对于key = 1,找到所有产品的成本总和(20 + 10 + 5 = 35),然后将结果附加到与key = 1对应的所有行。最终结果:
key product_code cost total_costs
1 UK 20 35
1 US 10 35
1 EU 5 35
2 UK 3 9
2 EU 6 9
我更愿意在不使用子连接的情况下执行此操作,因为这样效率很低。我最好的想法是将over
函数与sum
函数结合使用,但我无法使其工作。我最好的尝试:
SELECT key, product_code, sum(costs) over(PARTITION BY key)
FROM test
GROUP BY key, product_code;
我看过docs,但是如此神秘,我不知道如何解决这个问题。我正在使用Hive v0.12.0,HDP v2.0.6,HortonWorks Hadoop发行版。
答案 0 :(得分:9)
与@VB_ answer类似,请使用BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
statement。
因此,HiveQL查询是:
SELECT key, product_code,
SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM test;
答案 1 :(得分:4)
如果没有自我加入,您可以使用BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
来实现这一目标。
代码如下:
SELECT a, SUM(b) OVER (PARTITION BY c ORDER BY d ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM T;
答案 2 :(得分:2)
分析函数sum给出累积和。例如,如果您这样做:
select key, product_code, cost, sum(cost) over (partition by key) as total_costs from test
然后你会得到:
key product_code cost total_costs
1 UK 20 20
1 US 10 30
1 EU 5 35
2 UK 3 3
2 EU 6 9
似乎不是你想要的。
相反,您应该使用聚合函数sum,并结合自连接来实现此目的:
select test.key, test.product_code, test.cost, agg.total_cost
from (
select key, sum(cost) as total_cost
from test
group by key
) agg
join test
on agg.key = test.key;
答案 3 :(得分:1)
上表似乎是
key product_code cost
1 UK 20
1 US 10
1 EU 5
2 UK 3
2 EU 6
用户想要一个包含总费用的表格,如下所示
key product_code cost total_costs
1 UK 20 35
1 US 10 35
1 EU 5 35
2 UK 3 9
2 EU 6 9
因此我们使用了以下查询
SELECT key, product_code,
SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM test;
到目前为止一切顺利。 我想要一个更多的列,计算每个国家的出现次数
key product_code cost total_costs occurences
1 UK 20 35 2
1 US 10 35 1
1 EU 5 35 2
2 UK 3 9 2
2 EU 6 9 2
因此我使用了以下查询
SELECT key, product_code,
SUM(costs) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as total_costs
COUNT(product code) OVER (PARTITION BY key ORDER BY key ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) as occurences
FROM test;
可悲的是,这不起作用。我得到一个神秘的错误。要在我的查询中排除错误,我想问我是否做错了什么。 感谢
答案 4 :(得分:1)
类似的答案(如果我们使用oracle emp表):
select deptno, ename, sal, sum(sal) over(partition by deptno) from emp;
输出将如下所示:
deptno ename sal sum_window_0
10 MILLER 1300 8750
10 KING 5000 8750
10 CLARK 2450 8750
20 SCOTT 3000 10875
20 FORD 3000 10875
20 ADAMS 1100 10875
20 JONES 2975 10875
20 SMITH 800 10875
30 BLAKE 2850 9400
30 MARTIN 1250 9400
30 ALLEN 1600 9400
30 WARD 1250 9400
30 TURNER 1500 9400
30 JAMES 950 9400
答案 5 :(得分:0)
此查询为我提供了完美的结果
select key, product_code, cost, sum(cost) over (partition by key) as total_costs from zone;