通过PiG识别列

时间:2016-02-11 09:41:26

标签: apache-pig

我的数据集如下:

"column,1A",column2A,column3A
"column,1B",column2B,column3B
"column,1C",column2C,column3C
"column,1D",column2D,column3D

在这种情况下我应该使用什么分隔符来分隔上面3列。

First column value is => Column,1A
Second column value is => Column2A
Third column value is => Column3A

2 个答案:

答案 0 :(得分:0)

我可以使用以下步骤解决它:

Input:-
"column,1A",column2A,column3A
"column,1B",column2B,column3B
"column,1C",column2C,column3C
"column,1D",column2D,column3D

PiG Script :-
A = load '/home/hduser/pig_ex' AS line;
B = FOREACH A GENERATE FLATTEN(STRSPLIT(line,'\\,',4)) AS (firstcol:chararray,secondcol:chararray,thirdcol:chararray,forthcol:chararray);
C = FOREACH B GENERATE REGEX_EXTRACT(firstcol, '^\\"(.*)', 1) AS (modfirstcol),REGEX_EXTRACT(secondcol, '^(.*)\\"', 1) AS (modsecondcol),thirdcol,forthcol;
D = FOREACH C GENERATE CONCAT(modfirstcol,',',modsecondcol),thirdcol,forthcol; 
DUMP D;

Output :-
(column,1A,column2A,column3A)
(column,1B,column2B,column3B)
(column,1C,column2C,column3C)
(column,1D,column2D,column3D)

如果有更好的方法,请告诉我

答案 1 :(得分:0)

试试我的代码:

    a = LOAD '/home/hduser/pig_ex' USING PigStorage(',') AS (col1,col2,col3,col4);

    b = FOREACH a GENERATE REGEX_EXTRACT(col1, '^\\"(.*)', 1) AS (modfirstcol),REGEX_EXTRACT(col2, '^(.*)\\"', 1) AS (modsecondcol),col3,col4;

    c = foreach b generate CONCAT($0, CONCAT(', ', $1)), $2 , $3;

    dump c;