使用solrindex命令时Nutch导致错误

时间:2016-01-28 12:51:34

标签: nutch solr5

我正在使用nutch 1.11(2015年12月7日发布)并使用bin / crawl命令帮助我完成工作,一切正常,直到达到 solrindex 命令将数据输入solr搜索引擎,它会导致错误:

SolrIndexWriter
    solr.server.type : Type of SolrServer to communicate with (default 'http' however options include 'cloud', 'lb' and 'concurrent')
    solr.server.url : URL of the Solr instance (mandatory)
    solr.zookeeper.url : URL of the Zookeeper URL (mandatory if 'cloud' value for solr.server.type)
    solr.loadbalance.urls : Comma-separated string of Solr server strings to be used (madatory if 'lb' value for solr.server.type)
    solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
    solr.commit.size : buffer size when sending to Solr (default 1000)
    solr.auth : use authentication (default false)
    solr.auth.username : username for authentication
    solr.auth.password : password for authentication


2016-01-28 02:49:41,422 INFO  indexer.IndexerMapReduce - IndexerMapReduce: crawldb: nutchweb/crawldb
2016-01-28 02:49:41,425 INFO  indexer.IndexerMapReduce - IndexerMapReduce: linkdb: nutchweb/linkdb
2016-01-28 02:49:41,425 INFO  indexer.IndexerMapReduce - IndexerMapReduces: adding segment: nutchweb/segments/20160127234706
2016-01-28 02:49:41,652 WARN  util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2016-01-28 02:49:42,586 WARN  conf.Configuration - file:/tmp/hadoop-micky/mapred/staging/micky810285982/.staging/job_local810285982_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-01-28 02:49:42,587 WARN  conf.Configuration - file:/tmp/hadoop-micky/mapred/staging/micky810285982/.staging/job_local810285982_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-01-28 02:49:42,751 WARN  conf.Configuration - file:/tmp/hadoop-micky/mapred/local/localRunner/micky/job_local810285982_0001/job_local810285982_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
2016-01-28 02:49:42,752 WARN  conf.Configuration - file:/tmp/hadoop-micky/mapred/local/localRunner/micky/job_local810285982_0001/job_local810285982_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
2016-01-28 02:49:43,342 INFO  anchor.AnchorIndexingFilter - Anchor deduplication is: off
2016-01-28 02:49:49,230 INFO  indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter
2016-01-28 02:49:50,627 INFO  solr.SolrMappingReader - source: content dest: content
2016-01-28 02:49:50,627 INFO  solr.SolrMappingReader - source: title dest: title
2016-01-28 02:49:50,627 INFO  solr.SolrMappingReader - source: host dest: host
2016-01-28 02:49:50,627 INFO  solr.SolrMappingReader - source: segment dest: segment
2016-01-28 02:49:50,627 INFO  solr.SolrMappingReader - source: boost dest: boost
2016-01-28 02:49:50,627 INFO  solr.SolrMappingReader - source: digest dest: digest
2016-01-28 02:49:50,627 INFO  solr.SolrMappingReader - source: tstamp dest: tstamp
2016-01-28 02:49:50,959 INFO  solr.SolrIndexWriter - Indexing 250 documents
2016-01-28 02:49:50,960 INFO  solr.SolrIndexWriter - Deleting 0 documents
2016-01-28 02:49:54,346 INFO  solr.SolrIndexWriter - Indexing 250 documents
2016-01-28 02:50:06,471 WARN  mapred.LocalJobRunner - job_local810285982_0001
java.lang.Exception: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id http://nutch.apache.org/apidocs/apidocs-1.1/overview-tree.html to the index; possible analysis error.
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Exception writing document id http://nutch.apache.org/apidocs/apidocs-1.1/overview-tree.html to the index; possible analysis error.
    at org.apache.solr.client.solrj.impl.HttpSolrServer.executeMethod(HttpSolrServer.java:552)
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:210)
    at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:206)
    at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:124)
    at org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:134)
    at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:85)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
    at org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
    at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
    at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:356)
    at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:56)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
    at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
2016-01-28 02:50:07,330 ERROR indexer.IndexingJob - Indexer: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:145)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:222)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:231)

我发现错误是

  

编写文档ID的异常   http://nutch.apache.org/apidocs/apidocs-1.1/overview-tree.html到   指数;可能的分析错误。

以前似乎没有人遇到这个错误,请帮忙。

0 个答案:

没有答案