Nutch作业无法将步骤链接到Solr

时间:2019-01-13 00:36:31

标签: solr search-engine nutch

我已经配置了Nutch和Solr,它们已经启动。我正在使用Solr为Nutch爬行的文档建立索引。但是,两者之间的通信(linkdb命令)失败。我发现了类似的线程,但是没有一个解决方案对我有用。类似线程(Nutch job failing when sending data to Solr

我通过以下信息设置了配置文件:https://www.cs.toronto.edu/~muuo/blog/build-yourself-a-mini-search-engine/

版本: Nutch 1.14(https://archive.apache.org/dist/nutch/1.14/apache-nutch-1.14-bin.tar.gz) Solr 6.6(http://mirror.dsrg.utoronto.ca/apache/lucene/solr/6.6.5/solr-6.6.5.tgz

我已经尝试使用Nutch Wiki中给出的https://github.com/apache/nutch/blob/master/conf/schema.xml中最近的schema.xml文件。

我从以下代码开始

 nutch/bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch -s nutch/urls/ Crawl 2

中断
/home/sk/SearchEngine/nutch/bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/nutch Crawl/crawldb -linkdb Crawl/linkdb Crawl/segments/20190112160715
Failed with exit value 255.

错误:

Active IndexWriters :
SOLRIndexWriter
    solr.server.url : URL of the SOLR instance
    solr.zookeeper.hosts : URL of the Zookeeper quorum
    solr.commit.size : buffer size when sending to SOLR (default 1000)
    solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml)
    solr.auth : use authentication (default false)
    solr.auth.username : username for authentication
    solr.auth.password : password for authentication


Indexing 87/87 documents
Deleting 0 documents
Indexing 87/87 documents
Deleting 0 documents
Indexer: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
    at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:147)
    at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:230)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:239)

Error running:
  /home/sk/SearchEngine/nutch/bin/nutch index -Dsolr.server.url=http://localhost:8983/solr/nutch Crawl/crawldb -linkdb Crawl/linkdb Crawl/segments/20190112160715
Failed with exit value 255.

0 个答案:

没有答案