Pig脚本与sum函数

时间:2014-07-11 03:40:55

标签: apache-pig

我的输入文件在

下面
a,t1,1000,100
a,t1,2000,200
a,t1,1000,500
b,t2,1000,200
b,t2,5000,100

这是我的剧本。这是投掷总和错误。你能纠正吗

myinput      = LOAD 'file' USING PigStorage(',') AS(a1:chararray,a2:chararray,total:int,div:int)
for_disticnt = FOREACH myinput GENERATE a2;
grp_disticnt = GROUP for_distinct ALL;
disticnt_count=FOREACH grp_disticnt GENEARATE COUNT(for_disticnt) as finalcount;
grouped = GROUP myinput BY a1;
result = FOREACH grouped GENEARTE group,SUM(myinput.total/myinput.div)/distinct_count;

所以分组的输出是

 ((a),{(a,t1,1000,100),(a,t1,2000,200)})

 ((b),{(b,t2,1000,200),(b,t2,5000,100)})

我想在单个组的每个元组中将2美元除以$ 3,然后对其进行SUM,然后最后将该SUM除以不同的$ 1。

分组中每个行李的总和逻辑如下。

[(1000/100)+((2000/200)]/count(distinct $1 in myinput) 

[(1000/200)+(5000/100)]/count(distinct $1 in myinput)

我想要输出如下

(a,10)
(b,27)

1 个答案:

答案 0 :(得分:0)

myinput = load 'data' using PigStorage(',') as 
          (a1:chararray, a2:chararray, total:int, div:int);

sub = foreach myinput generate a2;
dist = DISTINCT sub;
grpd = group dist all;
X = foreach grpd generate COUNT_STAR(dist);

A = foreach myinput generate a1, (total / div) as quotient;
grouped = group A by a1;
B = foreach grouped generate group, SUM(A.quotient) as sums;
C = CROSS B, X;
final = foreach C generate $0, ((float)($1) / (float)($2));

<强>输出

(a,11.0)
(b,27.5)