如何使用apache pig将包转换成多个袋子?

时间:2016-12-15 22:20:56

标签: apache-pig

我有一个包含两组数据的文件,如下所示:

1,abc,10,dss
2,efgh,as
1,abc,10,1234
2,efgh,as
1,abc,10,7899
2,efgh,as

以#1开头的记录是一组,以#2开头的记录是不同的集合。所以两者都有不同的结构。如何分开这两组记录?

2 个答案:

答案 0 :(得分:0)

这是一种方式......

A = LOAD '/user/data/split.txt' as line:chararray;
B  = FOREACH A GENERATE  Flatten(TOKENIZE(line,' ')) ;
B1 = filter B by $0  matches '1.*';
B2 = filter B by $0  matches '2.*';
DUMP B1
DUMP B2
 or 
 SPLIT B INTO B1 IF ($0  matches '1.*'), B2 IF ($0  matches '2.*');

答案 1 :(得分:0)

使用新的更新版本的输入,这是其他解决方案

A = LOAD '/user/data/split.txt' as line:chararray;
B1 = filter A by $0  matches '1.*';
B2 = filter A by $0  matches '2.*';
or 
SPLIT A INTO B1 IF ($0  matches '1.*'), B2 IF ($0  matches '2.*');
相关问题