仅返回全文搜索(elasticsearch)中的完全匹配(子字符串)

时间:2016-06-21 19:48:29

标签: elasticsearch lucene

我在elasticsearch中有一个带有'标题'字段(分析字符串字段)。如果我将以下文件编入索引:

{title: "Joe Dirt"}
{title: "Meet Joe Black"}
{title: "Tomorrow Never Dies"}

并且搜索查询是"我想明天看电影Joe Dirt"

我想找到完整标题匹配的结果作为搜索查询的子字符串。如果我使用直接匹配查询,则将返回所有这些文档,因为它们都匹配其中一个单词。我真的只想回归" Joe Dirt"因为标题是搜索查询的完全匹配子字符串。

在弹性搜索中可以吗?

谢谢!

1 个答案:

答案 0 :(得分:1)

实现这一目标的一种方法如下:

1)使用keyword tokenizer

索引索引title

2)搜索时使用shingle token-filter从查询字符串中提取子字符串并与标题匹配

实施例

索引设置

put test 
{
   "settings": {
      "analysis": {
         "analyzer": {
            "substring": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "substring"           
               ]
            },
            "exact": {
               "type": "custom",
               "tokenizer": "keyword",
               "filter": [
                  "lowercase"
               ]
            }
         },
         "filter": {
            "substring": {
              "type":"shingle",
                "output_unigrams" : true

            }
         }
      }
   },
   "mappings": {
      "movie": {
         "properties": {
            "title": {
               "type": "string",
               "fields": {
                  "raw": {
                     "type": "string",
                     "analyzer": "exact"
                  }
               }
            }
         }
      }
   }
}

索引文档

put test/movie/1
{"title": "Joe Dirt"}
put test/movie/2
{"title": "Meet Joe Black"}
put test/movie/3
{"title": "Tomorrow Never Dies"}

<强>查询

 post test/_search
    {
        "query": {
            "match": {
               "title.raw" : {
                   "analyzer": "substring",
                   "query": "Joe Dirt tomorrow"
               }
            }
        }
    }

结果:

  "hits": {
      "total": 1,
      "max_score": 0.015511602,
      "hits": [
         {
            "_index": "test",
            "_type": "movie",
            "_id": "1",
            "_score": 0.015511602,
            "_source": {
               "title": "Joe Dirt"
            }
         }
      ]
   }