Question

我有http://drive.google.com之类的链接，我希望将“google”与链接相匹配。

我有：

query: {
    bool : {
        must: {
            match: { text: 'google'} 
        }
    }
}

但是，如果整个文本是“google”（不区分大小写，那么它也匹配Google或GooGlE等），这只会匹配。如何匹配另一个字符串中的'google'？

Answer 1

关键是您使用的{ElasticSearch正则表达式requires a full string match：

Lucene的模式总是固定。提供的模式必须与整个字符串匹配。

因此，要匹配任何字符（但换行符），您可以使用.*模式：

match: { text: '.*google.*'}
                ^^      ^^

另外一种变体适用于字符串可以包含换行符的情况：match: { text: '(.|\n)*google(.|\n)*'}。这个糟糕的(.|\n)*在ElasticSearch中是必须的，因为这个正则表达式的风格不允许任何[\s\S]变通方法，也不允许任何DOTALL / Singleline标志。 "The Lucene regular expression engine is not Perl-compatible but supports a smaller range of operators."

Answer 2

使用通配符查询：

'{"query":{ "wildcard": { "text.keyword" : "*google*" }}}'

Answer 3

我无法在match中找到breaking change禁用正则表达式，但match: { text: '.*google.*'}不适用于我的任何Elasticsearch 6.2群集。也许它是可配置的？

Regexp有效：

"query": {
   "regexp": { "text": ".*google.*"} 
}

Answer 4

对于部分和全文匹配，以下方法均有效

"query" : {
    "query_string" : {
      "query" : "*searchText*",
      "fields" : [
        "fieldName"
      ]
    }

Answer 5

对于更通用的解决方案，您可以考虑使用不同的分析器或定义自己的分析器。我假设您使用的是标准分析器，它会将http://drive.google.com分割为代币“http”和“drive.google.com”。这就是为什么搜索只是谷歌无法正常工作，因为它试图将其与完整的“drive.google.com”进行比较。

如果您使用简单的分析器将文档编入索引，则会将其拆分为“http”，“drive”，“google”和“com”。这将允许您自己匹配这些条款中的任何一个。

Answer 6

对于部分匹配，您可以使用prefix或match_phrase_prefix。

Answer 7

使用 node.js 客户端

tag_name 是字段名称，value 是传入的搜索值。

  const { body } = await elasticWrapper.client.search({
        index: ElasticIndexs.Tags,
        body: {
          query: {
            wildcard: {
              tag_name: {
                value: `*${value}*`,
                boost: 1.0,
                rewrite: 'constant_score',
              },
            },
          },
        },
      });

如何在Elasticsearch中进行部分匹配？

7 个答案: