Google云搜索:Apache Nutch连接器,未激活IndexWriters-检查您的配置

时间:2019-03-16 15:13:03

标签: nutch

引用此线程 thread

我也尝试使用Google Cloud Search,但问题不同。

我停留在“未激活IndexWriters-检查您的配置”

我在thread

中添加示例的conf / nutch-site.xml
    <property>
      <name>plugin.includes</name>
      <value>protocol-httpclient|urlfilter-regex|parse-(html|tika)|index-(basic|more|metadata)|indexer-google-cloud-search|urlnormalizer-(pass|regex|basic)</value>
      <description>Regular expression naming plugin directory names to
      include.  Any plugin not matching this expression is excluded.
      In any case you need at least include the nutch-extensionpoints plugin. By
      default Nutch includes crawling just HTML and plain text via HTTP,
      and basic indexing and search plugins. In order to use HTTPS please enable
      protocol-httpclient, but be aware of possible intermittent problems with the
      underlying commons-httpclient library.
      </description>
    </property>

但是我没有得到下面的输出。

      INFO  gcs.GoogleCloudSearchIndexWriter - Starting up!

相反,我得到了这个。

    Indexer: starting at 2019-03-16 14:53:13
    Indexer: deleting gone documents: false
    Indexer: URL filtering: false
    Indexer: URL normalizing: false
    No IndexWriters activated - check your configuration

    Indexer: number of documents indexed, deleted, or skipped:
    Indexer:      1  indexed (add/update)
    Indexer: finished at 2019-03-16 14:53:14, elapsed: 00:00:01

请引导我前进。

1 个答案:

答案 0 :(得分:0)

您仅启用了索引器插件(在这种情况下,该插件仅将数据发送到输出GCS)。您仍然需要配置IndexWriter。看看example template provided with Nutch