这是一个在支持论坛here发布的公开问题,但由于我没有得到任何回复,我想我应该尝试在这里询问。
我有一个使用MongoDB的现有应用程序 数据层。目前我正在使用Mongo的Map reduce机制,但是,我 我正面临一些性能问题。所以我想到了使用Hadoop 实现那个逻辑。
我已经成功运行了财务收益率的例子并且想到了 创建一个简单的项目只是为了了解mongo-hadoop驱动程序。 所以我创建了一个项目,在构建中插入了相应的jar文件 路径并运行它。
这是我的java代码:
final Configuration conf = new Configuration();
MongoConfigUtil.setInputURI( conf, "mongodb://
username:passw...@192.168.1.198/locations" );
MongoConfigUtil.setOutputURI( conf, "mongodb://localhost/
test.out" );
System.out.println( "Conf: " + conf );
final Job job = new Job( conf, "word count" );
job.setJarByClass( WordCount.class );
job.setMapperClass( TokenizerMapper.class );
job.setCombinerClass( IntSumReducer.class );
job.setReducerClass( IntSumReducer.class );
job.setOutputKeyClass( Text.class );
job.setOutputValueClass( IntWritable.class );
job.setInputFormatClass( MongoInputFormat.class );
job.setOutputFormatClass( MongoOutputFormat.class );
System.exit( job.waitForCompletion( true ) ? 0 : 1 );"
但是我收到了这个错误:
Conf: Configuration: core-default.xml, core-site.xml
12/05/20 14:12:03 WARN util.NativeCodeLoader: Unable to load native-
hadoop library for your platform... using builtin-java classes where
applicable
12/05/20 14:12:03 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
12/05/20 14:12:03 WARN mapred.JobClient: No job jar file set. User
classes may not be found. See JobConf(Class) or
JobConf#setJar(String).
12/05/20 14:12:03 INFO mapred.JobClient: Cleaning up the staging area
file:/tmp/hadoop-maximos/mapred/staging/maximos1261801897/.staging/
job_local_0001
Exception in thread "main" java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:
796)
at com.mongodb.DBApiLayer.doGetCollection(DBApiLayer.java:116)
at com.mongodb.DBApiLayer.doGetCollection(DBApiLayer.java:43)
at com.mongodb.DB.getCollection(DB.java:81)
at
com.mongodb.hadoop.util.MongoSplitter.calculateSplits(MongoSplitter.java:
51)
at
com.mongodb.hadoop.MongoInputFormat.getSplits(MongoInputFormat.java:
51)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:
962)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:979)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:174)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:897)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.j ava:
1093)
at
org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:
850)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at
com.mongodb.hadoop.examples.wordcount.WordCount.main(WordCount.java:
100)
我做错了什么?这是一个Mongo,Hadoop还是Mongo-Hadoop问题?
答案 0 :(得分:0)
您似乎忘了指定集合的名称(您从中获取数据)。
在示例中,该行如下所示:
MongoConfigUtil.setInputURI( conf, "mongodb://localhost/test.in" );
但是,在您的代码中,我看到:
MongoConfigUtil.setInputURI( conf, "mongodb://
username:passw...@192.168.1.198/locations" );
我不确定location是集合名称还是数据库名称,如果它是集合,那么您可以尝试使用数据库名称作为前缀。如果是数据库,请将.yourcollectionname添加到其末尾。