Hadoop:使用Pig查询TPC-H基准测试实验室

时间:2016-01-31 15:50:57

标签: hadoop apache-pig hortonworks-data-platform parquet

我正在尝试使用Hortonworks数据平台(2.3.2)在Hadoop上运行TPC-H Benchmark(www.tpc.org)。因此,我想使用Pig(版本0.15)以Parquet文件格式查询数据并对其进行基准测试。我使用tpch-gen以镶木地板文件格式创建了2 GB的数据。此外,我已经使用ParquetLoader下载了一个猪镶木地板来阅读镶木地板文件。我正在使用以下Pig脚本来查询它:

REGISTER /opt/parquet-pig-bundle-1.8.1.jar;

lineitem = LOAD '$input/lineitem' 
using org.apache.parquet.pig.ParquetLoader AS (orderkey:long, partkey:long, suppkey:long,
linenumber:long, quantity:double, extendedprice:double, discount:double, tax:double, returnflag:chararray, linestatus:chararray,
shipdate:chararray, commitdate:chararray, receiptdate:chararray, shipinstruct:chararray, shipmode:chararray, comment:chararray);

SubLineItems = FILTER lineitem BY shipdate <= '1998-09-16';

SubLine = FOREACH SubLineItems GENERATE returnflag, linestatus, quantity, extendedprice, extendedprice*(1-discount) AS disc_price, extendedprice*(1-discount)*(1+tax) AS charge, discount;

STORE SubLine INTO '$output/Q1_out' USING org.apache.parquet.pig.ParquetStorer();

当我执行此查询时,我收到以下错误:

  
    

2016-01-28 15:57:23,974 [main] INFO org.apache.pig.tools.pigstats.ScriptState - 使用的Pig功能     脚本:FILTER     2016-01-28 15:57:24,035 [主要] INFO     org.apache.pig.data.SchemaTupleBackend - 键[pig.schematuple]不是     set ...不会生成代码。 2016-01-28 15:57:24,098 [主要] INFO     org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -     {RULES_ENABLED = [AddForEach,ColumnMapKeyPrune,ConstantCalculator,     GroupByConstParallelSetter,LimitOptimizer,LoadTypeCastInserter,     MergeFilter,MergeForEach,PartitionFilterOptimizer,     PredicatePushdownOptimizer,PushDownForEachFlatten,PushUpFilter,     SplitFilter,StreamTypeCastInserter]} 2016-01-28 15:57:24,155 [main]     INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor -     为lineitem修剪的列:$ 0,$ 1,$ 2,$ 3,$ 4,$ 5,$ 6,$ 7,$ 9,$ 11,     12美元,13美元,14美元,15美元2016-01-28 15:57:24,160 [主要]错误     org.apache.pig.tools.grunt.Grunt - 错误2000:错误处理规则     ColumnMapKeyPrune。在日志文件中尝试-t ColumnMapKeyPrune详细信息:     /root/D2F-Bench/bin/pig_1453996639031.log

  

当我使用上面提到的-t ColumnMapKeyPrune时,查询正在执行而没有出现该错误但是大约需要1小时,这太长了。

我注意到当我在Pig查询中使用“FOREACH”时出现错误,当我删除包含FOREACH的行时,错误没有出现。此外,我已经尝试使用相同的猪脚本查询avro文件格式(只是更改了使用...行),这工作正常。

任何想法是什么问题?提前谢谢。

P.S。猪堆栈跟踪提供以下信息:

  
    

ERROR 2000:处理规则ColumnMapKeyPrune时出错。尝试-t ColumnMapKeyPrune

  
     

org.apache.pig.impl.logicalLayer.FrontendException:错误2000:错误   处理规则ColumnMapKeyPrune。尝试-t ColumnMapKeyPrune at   org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:125)     在   org.apache.pig.newplan.logical.relational.LogicalPlan.optimize(LogicalPlan.java:277)     在   org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1373)     在org.apache.pig.PigServer.execute(PigServer.java:1364)at   org.apache.pig.PigServer.executeBatch(PigServer.java:415)at   org.apache.pig.PigServer.executeBatch(PigServer.java:398)at   org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171)     在   org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)     在   org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)     在org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)at   org.apache.pig.Main.run(Main.java:502)at   org.apache.pig.Main.main(Main.java:177)at   sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:606)at   org.apache.hadoop.util.RunJar.run(RunJar.java:221)at   org.apache.hadoop.util.RunJar.main(RunJar.java:136)引起:   java.lang.NullPointerException at   org.apache.parquet.pig.ParquetLoader.getSchemaFromRequiredFieldList(ParquetLoader.java:364)     在   org.apache.parquet.pig.ParquetLoader.pushProjection(ParquetLoader.java:346)     在   org.apache.pig.newplan.logical.rules.ColumnPruneVisitor.visit(ColumnPruneVisitor.java:155)     在   org.apache.pig.newplan.logical.relational.LOLoad.accept(LOLoad.java:230)     在   org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)     在org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)at   org.apache.pig.newplan.logical.rules.ColumnMapKeyPrune $ ColumnMapKeyPruneTransformer.transform(ColumnMapKeyPrune.java:141)     在   org.apache.pig.newplan.optimizer.PlanOptimizer.optimize(PlanOptimizer.java:110)

     

......还有17个

0 个答案:

没有答案
相关问题