Question

我需要构造弹性请求来搜索短语的一部分（它必须是按字序列的小写搜索）。

例如，记录字段包含：

Lorem ipsum dolor sit amet, eam et gubergren vulputate

我需要在下一个方面找到这条记录（使用下一个搜索字词）：

Lorem ipsum
Lorem     ipsum dolor
lorem, ipsum.dolor
dolor sit amet

在我使用严格搜索之前。我的解决方案是创建自定义分析器（Tokenizer = "keyword" and Filter = ["lowercase"]），将其添加到字段并在执行映射时分析字段索引。但现在任务发生了变化。

有人可以帮我创建请求吗？即使是任何API弹性参考，我也会很高兴。

Answer 1

查看_analyze API。

通过使用提到的自定义分析器（lowercase keyword），您将创建一个单一的大型令牌：

$ curl -XGET 'localhost:9200/_analyze?tokenizer=keyword&filters=lowercase&text=Lorem+ipsum+dolor+sit+amet,+eam+et+gubergren+vulputate'
{
   "tokens": [
      {
         "token": "lorem ipsum dolor sit amet, eam et gubergren vulputate",
         "start_offset": 0,
         "end_offset": 54,
         "type": "word",
         "position": 1
      }
   ]
}

找到该令牌的唯一方法是搜索完全相同的（如果正在使用的话，则进行分析后）令牌。

但是，如果您根本没有使用自定义分析器，那么您将获得这些令牌：

$ curl -XGET 'localhost:9200/_analyze?text=Lorem+ipsum+dolor+sit+amet,+eam+et+gubergren+vulputate'
{
   "tokens": [
      {
         "token": "lorem",
         "start_offset": 0,
         "end_offset": 5,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "ipsum",
         "start_offset": 6,
         "end_offset": 11,
         "type": "<ALPHANUM>",
         "position": 2
      },
      {
         "token": "dolor",
         "start_offset": 12,
         "end_offset": 17,
         "type": "<ALPHANUM>",
         "position": 3
      },
      {
         "token": "sit",
         "start_offset": 18,
         "end_offset": 21,
         "type": "<ALPHANUM>",
         "position": 4
      },
      {
         "token": "amet",
         "start_offset": 22,
         "end_offset": 26,
         "type": "<ALPHANUM>",
         "position": 5
      },
      {
         "token": "eam",
         "start_offset": 28,
         "end_offset": 31,
         "type": "<ALPHANUM>",
         "position": 6
      },
      {
         "token": "et",
         "start_offset": 32,
         "end_offset": 34,
         "type": "<ALPHANUM>",
         "position": 7
      },
      {
         "token": "gubergren",
         "start_offset": 35,
         "end_offset": 44,
         "type": "<ALPHANUM>",
         "position": 8
      },
      {
         "token": "vulputate",
         "start_offset": 45,
         "end_offset": 54,
         "type": "<ALPHANUM>",
         "position": 9
      }
   ]
}

现在，您可以搜索＆＃34;句子中的任何单词＆＃34;并找到匹配项，包括使用phrase search。

更简单地考虑一下，您希望使用match查询进行搜索以获得全文搜索的好处，因为它将在搜索字词上使用相同的分析器。如果您使用term查询（或过滤器），那么它只会查看完全标记。

因此，如果不使用任何自定义分析器，那么您应该可以按原样使用这些搜索来查找文本：

$ curl -XPOST 'localhost:9200/test/type' -d '{
  "field" : "Lorem ipsum dolor sit amet, eam et gubergren vulputate"
}'

使用plain match query：

$ curl -XGET 'localhost:9200/test/_search' -d '{
  "query" : {
    "match" : {
      "field" : "lorem, ipsum.dolor"
    }
  }
}'

Elasticsearch。过滤\按部分短语搜索

1 个答案: