如何组合Pig FULL OUTER JOIN中的列

时间:2014-11-29 19:33:45

标签: database apache-pig

我有两张桌子:

1,'hello'
2,'world'
4,'this'

1,'john'
3,'king'

我想制作一张表

1,'hello','john'
2,'world',''
3,''     ,king
4,'this' ,''

我目前正在使用Pig命令:

JOIN A BY code FULL OUTER,
     B BY code;

但这给了我输出:

1,'hello',1,'john'
2,'world',,''
,''     ,3,king
4,'this' ,,''

我需要将代码列合并,我该怎么做?感谢

4 个答案:

答案 0 :(得分:1)

加入总会产生这样的输出,这是猪的预期行为。一种选择可以是尝试运算符,而不是加入运算符。

<强> A.TXT

1,'hello'
2,'world'
4,'this'

<强> b.txt

1,'john'
3,'king'

<强> PigScript:

A = LOAD 'a.txt' USING PigStorage(',') AS (code:int,name:chararray);
B = LOAD 'b.txt' USING PigStorage(',') AS (code:int,name:chararray);
C = GROUP A BY code,B BY code;
D = FOREACH C GENERATE group,(IsEmpty(A.name) ? TOTUPLE('') : BagToTuple(A.name)) AS aname,(IsEmpty(B.name) ? TOTUPLE('') : BagToTuple(B.name)) AS bname;
E = FOREACH D GENERATE group,FLATTEN(aname),FLATTEN(bname);
DUMP E;

<强>输出:

(1,'hello','john')
(2,'world',)
(3,,'king')
(4,'this',)

BagToTuple()在本地猪中不可用,您必须下载 pig-0.11.0.jar 并将其设置在类路径中。
从此链接下载jar:
http://www.java2s.com/Code/Jar/p/Downloadpig0110jar.htm

答案 1 :(得分:1)

A = load 'a' using PigStorage(',') as (code:int,name:chararray);
B = load 'b' using PigStorage(',') as (code:int,name:chararray);
C = join A by code full outer ,B by code;
D = foreach C generate 
    (A::code IS NULL ? B::code : A::code) AS code,
    A::name as aname, B::name as bname;
dump D;

结果是

(1,'hello','john')
(2,'world',)
(3,,'king')
(4,'this,) 

答案 2 :(得分:0)

您可以使用union,然后执行groupBy

联盟A,B会给你:

1,'hello'
2,'world'
4,'this'
1,'john'
3,'king'

现在根据id做一个groupBy。这会给你:

1, {'hello', 'john'}
2, {'world'}
3, {'king'}
4, {'this'}

现在你需要一个udf来解析这个包。在udf中迭代每个键以生成您的格式输出。

我也遇到了同样的问题。这就是我解决它的方法。

答案 3 :(得分:0)

您可以在联接后使用三元运算符重新分配新的code,具体取决于它是否填充在A或B关系中。在此示例中,如果A.code为null,则使用B.code,否则使用A.code。

C = JOIN A BY code FULL OUTER, B BY code;

D = FOREACH C GENERATE
  (A.code IS NULL ? B.code : A.code) AS code,
  A.field1,
  A.field2,
  B.field3,
  B.field4;