Question

我在使用Solr 6.4.2时获得了1.14的支持 Nutch不会抓取（跟踪）页面中的所有链接

<property>
  <name>db.ignore.internal.links</name>
  <value>false</value>
</property>
<property>
  <name>db.ignore.external.links</name>
  <value>false</value>
</property>

Answer 1

这里有很多可能性，nutch-site.xml包含许多属性。

您是否已检查过此内容？

<property>
   <name>db.max.outlinks.per.page</name>
   <value>100</value>
   <description>The maximum number of outlinks that we'll process for a page.
       If this value is nonnegative (>=0), at most db.max.outlinks.per.page outlinks
       will be processed for a page; otherwise, all outlinks will be processed.
   </description>
</property>

Nutch 1.14-不抓取页面中的所有链接

1 个答案: