
时间:2019-10-13 17:56:55

标签: elasticsearch elasticsearch-query



 "query": { 
        "fuzzy": {
                "name": {
                    "value": "Shahid"


"hits" : [{
    "_index" : "users",
    "_type" : "user",
    "_id" : "5sadsadsaddas",
    "_score" : 0.11127616,
    "fuzzyMatchPercentage": 100% // I expect something like this here
    "_source" : {
      "name" : "Shahid",
      "email" : "shahid@codeforgeek.com",
      "city" : "mumbai"

1 个答案:

答案 0 :(得分:0)

如评论中所述,fuzzy查询在Elasticsearch中的工作方式并非如此。默认情况下,搜索结果按降序排序,其中分数表示文档与特定查询的匹配程度。模糊性方面包含在该分数的计算中:查询匹配的越精确/越模糊,分数就越高。您可以通过请求详细的分数说明来验证这一点(在Elasticsearch v7.x中,模糊性方面已包含在提升因子的计算中)。看下面的例子:


POST fuzzy/_bulk
{"name": "Shahid"}
{"name": "Shahib"}

2。用fuzzy-查询搜索名称“ Shahid”

GET fuzzy/_search
  "explain": true, 
  "query": {
    "fuzzy": {
      "name": {
        "value": "Shahid"


对于拼写正确的文档(“ Shahid”):

    "_explanation" : {
      "value" : 0.57762265,
      "description" : "sum of:",
      "details" : [
          "value" : 0.57762265,
          "description" : "weight(name:shahid in 0) [PerFieldSimilarity], result of:",
          "details" : [
              "value" : 0.57762265,
              "description" : "score(freq=1.0), product of:",
              "details" : [
                  "value" : 1.8333334,
                  "description" : "boost",
                  "details" : [ ]
                  "value" : 0.6931472,
                  "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details" : [
                      "value" : 1,
                      "description" : "n, number of documents containing term",
                      "details" : [ ]
                      "value" : 2,
                      "description" : "N, total number of documents with field",
                      "details" : [ ]
                  "value" : 0.45454544,
                  "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details" : [
                      "value" : 1.0,
                      "description" : "freq, occurrences of term within document",
                      "details" : [ ]
                      "value" : 1.2,
                      "description" : "k1, term saturation parameter",
                      "details" : [ ]
                      "value" : 0.75,
                      "description" : "b, length normalization parameter",
                      "details" : [ ]
                      "value" : 1.0,
                      "description" : "dl, length of field",
                      "details" : [ ]
                      "value" : 1.0,
                      "description" : "avgdl, average length of field",
                      "details" : [ ]

对于拼写错误的文档(“ Shahib”):

    "_explanation" : {
      "value" : 0.46209806,
      "description" : "sum of:",
      "details" : [
          "value" : 0.46209806,
          "description" : "weight(name:shahib in 1) [PerFieldSimilarity], result of:",
          "details" : [
              "value" : 0.46209806,
              "description" : "score(freq=1.0), product of:",
              "details" : [
                  "value" : 1.4666666,
                  "description" : "boost",
                  "details" : [ ]
                  "value" : 0.6931472,
                  "description" : "idf, computed as log(1 + (N - n + 0.5) / (n + 0.5)) from:",
                  "details" : [
                      "value" : 1,
                      "description" : "n, number of documents containing term",
                      "details" : [ ]
                      "value" : 2,
                      "description" : "N, total number of documents with field",
                      "details" : [ ]
                  "value" : 0.45454544,
                  "description" : "tf, computed as freq / (freq + k1 * (1 - b + b * dl / avgdl)) from:",
                  "details" : [
                      "value" : 1.0,
                      "description" : "freq, occurrences of term within document",
                      "details" : [ ]
                      "value" : 1.2,
                      "description" : "k1, term saturation parameter",
                      "details" : [ ]
                      "value" : 0.75,
                      "description" : "b, length normalization parameter",
                      "details" : [ ]
                      "value" : 1.0,
                      "description" : "dl, length of field",
                      "details" : [ ]
                      "value" : 1.0,
                      "description" : "avgdl, average length of field",
                      "details" : [ ]

4。结论 不幸的是,没有关于升压因子的详细解释(Elasticsearch问题),但是从示例中可以看出,这是对两个文档评分的唯一区别:

  • Shahid:_得分:0.57762265 /提升:1.8333334
  • Shahib:_得分:0.46209806 /提升:1.4666666