Elasticsearch Edge-NGrams更喜欢更短的术语

时间:2018-03-27 21:41:16

标签: elasticsearch

我喜欢从Elasticsearch使用Edge-NGrams索引数据和不同的搜索分析器获得的结果。但是,我希望匹配的较短术语的排名高于较长的术语。

例如,请使用ABC100ABC100xxx这两个词。如果我使用术语ABC执行查询,我会将这两个文档作为具有相同分数的匹配返回。我希望ABC100得分高于ABC100xxx因为ABCLevenshtein distance algorithm之类的ABC100更接近匹配PUT stackoverflow { "settings": { "index": { "number_of_replicas": 0, "number_of_shards": 1 }, "analysis": { "filter": { "edge_ngram": { "type": "edgeNGram", "min_gram": "1", "max_gram": "20" } }, "analyzer": { "my_analyzer": { "type": "custom", "tokenizer": "whitespace", "filter": [ "edge_ngram" ] } } } }, "mappings": { "doc": { "properties": { "product": { "type": "text", "analyzer": "my_analyzer", "search_analyzer": "whitespace" } } } } }

设置索引:

PUT stackoverflow/doc/1
{
    "product": "ABC100"
}

PUT stackoverflow/doc/2
{
    "product": "ABC100xxx"
}

插入文件:

GET stackoverflow/_search?pretty
{
  "query": {
    "match": {
      "product": "ABC"
    }
  }
}

搜索查询:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.28247002,
    "hits": [
      {
        "_index": "stackoverflow",
        "_type": "doc",
        "_id": "2",
        "_score": 0.28247002,
        "_source": {
          "product": "ABC100xxx"
        }
      },
      {
        "_index": "stackoverflow",
        "_type": "doc",
        "_id": "1",
        "_score": 0.28247002,
        "_source": {
          "product": "ABC100"
        }
      }
    ]
  }
}

结果:

ABC100

是否有人知道如何缩短ABC100xxx排名高于ipconfig

1 个答案:

答案 0 :(得分:0)

在找到关于将字段长度存储为字段或使用脚本查询的大量less than optimal solutions后,我找到了the root of my problem。这只是因为我使用的是edge_ngrams标记过滤器而不是edge_ngrams标记器。