引用此线程 thread
我也尝试使用Google Cloud Search,但问题不同。
我停留在“未激活IndexWriters-检查您的配置”
我在thread
中添加示例的conf / nutch-site.xml <property>
<name>plugin.includes</name>
<value>protocol-httpclient|urlfilter-regex|parse-(html|tika)|index-(basic|more|metadata)|indexer-google-cloud-search|urlnormalizer-(pass|regex|basic)</value>
<description>Regular expression naming plugin directory names to
include. Any plugin not matching this expression is excluded.
In any case you need at least include the nutch-extensionpoints plugin. By
default Nutch includes crawling just HTML and plain text via HTTP,
and basic indexing and search plugins. In order to use HTTPS please enable
protocol-httpclient, but be aware of possible intermittent problems with the
underlying commons-httpclient library.
</description>
</property>
但是我没有得到下面的输出。
INFO gcs.GoogleCloudSearchIndexWriter - Starting up!
相反,我得到了这个。
Indexer: starting at 2019-03-16 14:53:13
Indexer: deleting gone documents: false
Indexer: URL filtering: false
Indexer: URL normalizing: false
No IndexWriters activated - check your configuration
Indexer: number of documents indexed, deleted, or skipped:
Indexer: 1 indexed (add/update)
Indexer: finished at 2019-03-16 14:53:14, elapsed: 00:00:01
请引导我前进。
答案 0 :(得分:0)
您仅启用了索引器插件(在这种情况下,该插件仅将数据发送到输出GCS)。您仍然需要配置IndexWriter。看看example template provided with Nutch。