无法拆分字词之间包含空格和制表符的chararray字段。使用Apache Pig帮我处理命令?

时间:2017-11-02 09:11:24

标签: apache-pig

Sample.txt文件

2017-01-01 10:21:59 THURSDAY    -39 3 Pick up a bus - Travel for two hours
2017-02-01 12:45:19 FRIDAY  -55 8 Pick up a train - Travel for one hour
2017-03-01 11:35:49 SUNDAY  -55 8 Pick up a train - Travel for one hour
I
.
. 

当我执行建议的命令时,它被分成三个字段。

当我执行以下操作时,它无法按预期工作。

A = LOAD 'Sample.txt' USING PigStorage() as (line:chararray);
B = foreach A generate STRSPLIT(line, ' ', 3);
c = foreach B generate $2;
split C into buslog if $0 matches '.*bus*.', trainlog if $0 matches '.*train*.';

注意: - C的转储将给出以下结果。

THURSDAY    -39 3 Pick up a bus - Travel for two hours
FRIDAY  -55 8 Pick up a train - Travel for one hour
SUNDAY  -55 8 Pick up a train - Travel for one hour

要求:在上面的结果中,我想将火车和公共汽车分成两个关系,但它没有按预期发生

1 个答案:

答案 0 :(得分:0)

语法为.*string.*。请注意,字符串两边都是.*

split C into buslog if $0 matches '.*bus.*', trainlog if $0 matches '.*train.*';