弹性搜索 - 多重匹配 - 短语搜索

时间:2015-10-19 20:21:57

标签: elasticsearch

我的目的是搜索针对多个字段的短语。

{
  "multi_match" : {
    "query" : "king of baro",
    "fields" : [ "filed1", "filed2", "filed3","filed5^9","filed6",filed7^9"],
    "type" : "phrase_prefix",
    "boost" : 10.0,
    "tie_breaker" : 0.0
  }
}

以上查询返回" baroda"它按预期工作。

但是,当我搜索"酒吧之王"时,它并没有返回任何内容。

{
      "multi_match" : {
        "query" : "king of bar",
        "fields" : [ "filed1", "filed2", "filed3","filed5^9","filed6",filed7^9"],
        "type" : "phrase_prefix",
        "boost" : 10.0,
        "tie_breaker" : 0.0
      }
    }

概要,

Search for "king of bar"  - No result
Search for "king of baro"  - returns "king of baroda"
Search for "king of baroda"  - returns "king of baroda"

我缺少任何配置吗?

映射文件: -

http://localhost:9200/sec/_mapping/

{  
   "sec":{  
      "mappings":{  
         "sec":{  
            "properties":{  
               "filed1":{  
                  "type":"string"
               },
               "filed2":{  
                  "type":"string"
               },
               "filed3":{  
                  "type":"string"
               },
               "filed4":{  
                  "type":"string"
               },
               "filed5":{  
                  "type":"string"
               },
               "filed6":{  
                  "type":"string"
               },
               "filed7":{  
                  "type":"string"
               }
            }
         }
      }
   }
}

分析器,来自elasticsearch.yml

index:
  analysis:
    analyzer:

      security_edge_ngram_analyzer:
          alias: [security_edge_ngram_analyzer]
          tokenizer: security_edge_ngram_tokenizer

    tokenizer:
      security_edge_ngram_tokenizer:
        type: edgeNGram

2 个答案:

答案 0 :(得分:2)

我的猜测是,您将edge ngram tokenizer配置为min_gram设置为4,但如果没有看到配置,很难确定。

以下是我this blog post Qbox中每个字段设置边缘ngram分析器的示例:

PUT /test_index
{
   "settings": {
      "analysis": {
         "filter": {
            "edge_ngram_filter": {
               "type": "edge_ngram",
               "min_gram": 2,
               "max_gram": 20
            }
         },
         "analyzer": {
            "edge_ngram_analyzer": {
               "type": "custom",
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "edge_ngram_filter"
               ]
            }
         }
      }
   },
   "mappings": {
      "doc": {
         "properties": {
            "text_field": {
               "type": "string",
               "index_analyzer": "edge_ngram_analyzer",
               "search_analyzer": "standard"
            }
         }
      }
   }
}

答案 1 :(得分:1)

首先,我会仔细检查我的自定义分析器是否按预期工作。我这样做是为了使用fielddata_fields

GET sec/sec/_search
{
  "fielddata_fields": ["filed1","field2","filed3","field4","filed5","field6","filed7"]
}

正确的edgeNGram设置会产生如下输出:

        "fields": {
           "filed1": [
              "ki",
              "kin",
              "king",
              "king ",
              "king o",
              "king of",
              "king of ",
              "king of b",
              "king of ba",
              "king of bar",
              "king of baro",
              "king of barod",
              "king of baroda"
           ]
        }

如果你没有看到类似的东西,那我就看看分析仪是如何设置的,以及它的配置是否合适。作为检查这一点的第二种方法,我创建了一个简单的测试索引,我将自定义分析器直接设置在一个字段上并测试与上面相同:

PUT /sec
{
  "mappings": {
    "sec": {
      "properties": {
        "filed1": {
          "type": "string",
          "analyzer": "security_edge_ngram_analyzer"
        }
      }
    }
  },
  "settings": {
    "analysis": {
      "analyzer": {
        "security_edge_ngram_analyzer": {
          "tokenizer": "security_edge_ngram_tokenizer"
        }
      },
      "tokenizer": {
        "security_edge_ngram_tokenizer": {
          "type": "edgeNGram",
          "min_gram": 2,
          "max_gram": 20
        }
      }
    }
  }
}