将具有多个字段的元组拆分为带有Pig中单个字段的元组

时间:2013-07-25 05:39:19

标签: apache-pig

我有不同长度的元组。我试图将它们转换为只有一个字段的元组(每个字段都是一个映射) 原始数据:

dump entryArray;
([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585],[symbol#RFG,security_type#ETF,foreign_entry_id#5586],[symbol#SCHW,security_type#EQUITY,foreign_entry_id#5587],[symbol#VWO,security_type#ETF,foreign_entry_id#5588])

我希望输出(每个字段仍然是地图):

([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585])   
([symbol#RFG,security_type#ETF,foreign_entry_id#5586])
([symbol#SCHW,security_type#EQUITY,foreign_entry_id#5587])
([symbol#VWO,security_type#ETF,foreign_entry_id#5588])


我试过:entry = FOREACH entryArray GENERATE FLATTEN(TOBAG());输出格式相同,但似乎该字段不再是MAP:

entry = FOREACH entryArray GENERATE FLATTEN(TOBAG());
dump entry;
([symbol#HIG,security_type#EQUITY,foreign_entry_id#743094])
([symbol#PEW,security_type#EQUITY,foreign_entry_id#743084])
([symbol#AFFY,security_type#EQUITY,foreign_entry_id#5585])   
([symbol#RFG,security_type#ETF,foreign_entry_id#5586])
([symbol#SCHW,security_type#EQUITY,foreign_entry_id#5587])
([symbol#VWO,security_type#ETF,foreign_entry_id#5588])

security_type = FOREACH entry GENERATE FLATTEN($0#'security_type');
it throws:
ERROR 1052: Cannot cast bytearray to map with schema :map
org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1059: <line 18, column 16> Problem while reconciling output schema of ForEach
at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.throwTypeCheckerException(TypeCheckingRelVisitor.java:141)
at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:181)
at org.apache.pig.newplan.logical.relational.LOForEach.accept(LOForEach.java:75)
......

任何建议都将非常感谢。谢谢!

0 个答案:

没有答案