层叠管道中SQL NOT IN的等价物是什么?

时间:2016-03-24 11:34:43

标签: hadoop cascading

我有两个带有一个公共字段的文件,根据我需要获取第二个文件值的字段值。

如何在此处添加where条件?

还有其他PIPE可供NOT IN使用吗?

File1中:

tcno,date,amt
1234,3/10/2016,1000
1234,3/11/2016,400
23456,2/10/2016,1500

文件2:

cno,fname,lname,city,phone,mail
1234,first,last,city,1234556,123@123.com

示例代码:

Pipe pipe1 = new Pipe("custPipe");
Pipe pipe2 = new Pipe("tscnPipe");
Fields cJoinField = new Fields("cno");
Fields tJoinField = new Fields("tcno");
Pipe pipe = new HashJoin(pipe1, cJoinField, pipe2, tJoinField,  new OuterJoin());
//HOW TO ADD WHERE CONDITION i.e. CNO IS NULL FROM SECOND FILE
Fields outFields = new Fields("tcno","tdate", "tamt");

我希望输出作为第一个文件的最后一行[23456,2/10/2016,1500]

1 个答案:

答案 0 :(得分:3)

根据代码中的评论:

//HOW TO ADD WHERE CONDITION i.e. CNO IS NULL FROM SECOND FILE

尝试使用FilterNull

HashJoin步骤

之后,将以下行添加到代码中
FilterNull filterNull = new FilterNull();
pipe = new Each( pipe, cJoinField, filterNull );

类似的东西:

Pipe pipe1 = new Pipe("custPipe");
Pipe pipe2 = new Pipe("tscnPipe");
Fields cJoinField = new Fields("cno");
Fields tJoinField = new Fields("tcno");
Pipe pipe = new HashJoin(pipe1, cJoinField, pipe2, tJoinField,  new OuterJoin());

// Filter out those tuples which has cno as null
FilterNull filterNull = new FilterNull();
pipe = new Each( pipe, cJoinField, filterNull );

Fields outFields = new Fields("tcno","tdate", "tamt");