染料限制和变平会产生错误的结果

时间:2012-12-03 20:27:28

标签: limit apache-pig flatten

B = GROUP A BY state;
C = FOREACH B {                          
   DA = ORDER A BY population DESC;                
   DB = LIMIT DA 5;                         
   GENERATE FLATTEN(group), FLATTEN(DB.name), FLATTEN(DB.population);
}

问题是我得到了城市的名字5次而不是1.我得到的结果是:

(ALASKA,M,27257)
(ALASKA,M,23696)
(ALASKA,M,19949)
(ALASKA,M,19926)
(ALASKA,M,19833)
(ALASKA,H,27257)
(ALASKA,H,23696)
(ALASKA,H,19949)
(ALASKA,H,19926)
(ALASKA,H,19833)

我需要的输出是:

(ALASKA,M,27257)
(ALASKA,H,23696)

1 个答案:

答案 0 :(得分:1)

2 flattens:FLATTEN(DB.name),FLATTEN(DB.population);在2个袋子之间产生Cartezian产品,用一个袋子替换它

B = GROUP A BY state;
C = FOREACH B {                          
   DA = ORDER A BY population DESC;                
   DB = LIMIT DA 5;                         
   GENERATE FLATTEN(group), FLATTEN(DB.(name, population));
}

或者,由GROUP BY创建的行李包含所有原始元组和所有列,您可以执行此操作:

B = GROUP A BY state;
C = FOREACH B {                          
   DA = ORDER A BY population DESC;                
   DB = LIMIT DA 5;                         
   GENERATE FLATTEN(DB);
}