匹配query_string文档的得分

时间:2016-07-10 13:02:39

标签: elasticsearch elasticsearch-2.0

我目前正在处理ES需要的非常讨厌的查询。 我的文档是嵌套文档,它们的索引看起来像这样:

"mydocs" : {
"properties" : {
            "doc" : {
                "type" : "nested",
                "properties" : {    
                    "name" : {"type" : "string", "store" : "yes", "index" : "analyzed"},
                    "tagln" : {"type" : "string", "store" : "yes", "index" : "analyzed"},
                    "tags" : {"type" : "string", "store" : "yes", "index" : "analyzed"},
                    "featured" : {"type" : "integer", "store" : "yes", "index" : "not_analyzed"}
                    "blkd" : {"type" : "integer", "store" : "yes", "index" : "not_analyzed"},
... etc ...
}

我试图通过一个特殊的分数算法来增加名称,标签和标签字段,增加特色* 10000 + [在名称中找到] * 1000 + [在tagln中找到] * 10 +的分数[在标签中找到] * 10。我的查询如下:

{
  "from" : 0,
  "size" : 10,
  "query" : {
    "nested" : {
      "query" : {
        "filtered" : {
          "query" : {
            "bool" : {
              "must" : [ {
                "term" : {
                  "doc.blkd" : 0
                }
              } ],
              "should" : [ {
                "function_score" : {
                  "functions" : [ {
                    "field_value_factor" : {
                      "field" : "doc.featured",
                      "factor" : 10000.0
                    }
                  } ],
                  "score_mode" : "sum",
                  "boost_mode" : "sum"
                }
              }, {
                "constant_score" : {
                  "filter" : {
                    "query_string" : {
                      "query" : "featured*",
                      "fields" : [ "doc.name^1000.0" ]
                    }
                  },
                  "boost" : 1000.0
                }
              }, {
                "constant_score" : {
                  "filter" : {
                    "query_string" : {
                      "query" : "featured*",
                      "fields" : [ "doc.tags^10.0" ],
                      "boost" : 10.0
                    }
                  }
                }
              }, {
                "constant_score" : {
                  "filter" : {
                    "query_string" : {
                      "query" : "featured*",
                      "fields" : [ "doc.tagln^10.0" ],
                      "boost" : 10.0
                    }
                  }
                }
              } ],
              "minimum_should_match" : "0"
            }
          }
        }
      },
      "path" : "doc",
      "score_mode" : "sum"
    }
  },
  "explain" : false,
  "sort" : [ {
    "_score" : { }
  } ]
}

得分并没有考虑到应有的提升,特色的得分按预期工作,但query_string的提升不起作用, 文档与" aaa"在他们的名字得到5或0的小分数,而特色= 1返回分数4000/6000/7500等。

首先得分不是10000+,这很奇怪(可能是由于分数的很多因素),但名称中的匹配查询字符串对分数没有任何明显的影响。

我怎样才能解决这个问题或至少调试它(看看分数是如何构建的)? 尝试将解释改为真,但我得到的是这个相当无用(或者对我来说可能是不可读的)解释:

"_explanation": {
          "value": 4000.0024,
          "description": "sum of:",
          "details": [
            {
              "value": 4000.0024,
              "description": "Score based on child doc range from 387 to 387",
              "details": []
            },
            {
              "value": 0,
              "description": "match on required clause, product of:",
              "details": [
                {
                  "value": 0,
                  "description": "# clause",
                  "details": []
                },
                {
                  "value": 0.0009999962,
                  "description": "-ConstantScore(_type:.percolator) #(+*:* -_type:__*), product of:",
                  "details": [
                    {
                      "value": 1,
                      "description": "boost",
                      "details": []
                    },
                    {
                      "value": 0.0009999962,
                      "description": "queryNorm",
                      "details": []
                    }
                  ]
                }
              ]
            }
          ]
        }

*已编辑*

感谢我们能够提供更多信息: 添加disable_coord-true和inner_hits explain-true之后 我已经尝试过"提升" query_string以任何方式我可以..查询如下:

{
  "from" : 0,
  "size" : 10,
  "query" : {
    "nested" : {
      "query" : {
        "filtered" : {
          "query" : {
            "bool" : {
              "must" : [ {
                "term" : {
                  "doc.blkd" : 0
                }
              } ],
              "should" : [ {
                "function_score" : {
                  "functions" : [ {
                    "field_value_factor" : {
                      "field" : "doc.featured",
                      "factor" : 10000.0
                    }
                  } ],
                  "score_mode" : "sum",
                  "boost_mode" : "sum"
                }
              }, {
                "constant_score" : {
                  "filter" : {
                    "query_string" : {
                      "query" : "*featured*",
                      "fields" : [ "doc.name^1000.0" ]
                    }
                  },
                  "boost" : 1000.0
                }
              }, {
                "query_string" : {
                  "query" : "*featured*",
                  "fields" : [ "doc.tags^100.0" ],
                  "boost" : 100.0
                }
              }, {
                "constant_score" : {
                  "filter" : {
                    "query_string" : {
                      "query" : "*featured*",
                      "fields" : [ "doc.tagln^10.0" ],
                      "boost" : 10.0
                    }
                  }
                }
              } ],
              "disable_coord" : true,
              "minimum_should_match" : "0"
            }
          },
          "filter" : {
            "bool" : {
              "should" : [ {
                "query_string" : {
                  "query" : "*featured*",
                  "fields" : [ "doc.name^1000000.0", "doc.tags^10.0", "doc.tagln^10.0" ],
                  "boost" : 1000.0
                }
              } ],
              "minimum_should_match" : "0"
            }
          }
        }
      },
      "path" : "doc",
      "score_mode" : "sum",
         "inner_hits" : {
             "explain" : "true"
         }
    }
  },
  "explain" : false,
  "sort" : [ {
    "_score" : { }
  } ]
}

正如您所看到的,我已将query_string添加到过滤器并将其中一个查询应该更改为不是constant_score

doc的解释现在看起来像这样:

"max_score": 10001,
"hits": [
  {
    "_index": "myindex",
    "_type": "mydocs",
    "_id": "1111",
    "_score": 10001,
    "_ttl": 86158563,
    "_source": {
      "meta": {
        "id": "1111",
        "rev": "35-14602ccf5c3d429e0000000002000000",
        "expiration": 0,
        "flags": 33554432
      },
      "doc": {
        "featured": 1,
        "tagln": "hello location 1",
        "blkd": 0,
        "tags": [
          "UsLocTaglinefeat"
        ],
        "name": "hello US location featured"
      }
    },
    "inner_hits": {
"doc": {
"hits": {
  "total": 1,
  "max_score": 10001,
  "hits": [
    {
      "_shard": 1,
      "_node": "YIXx2rrKR2O5q9519FIr_Q",
      "_index": "myindex",
      "_type": "mydocs",
      "_id": "1111",
      "_nested": {
        "field": "doc",
        "offset": 0
      },
      "_score": 10001,
      "_source": {
        "featured": 1,
        "tagln": "hello location 1",
        "blkd": 0,
        "tags": [
          "UsLocTaglinefeat"
        ],
        "name": "hello US location featured"
      },
      "_explanation": {
        "value": 10001,
        "description": "sum of:",
        "details": [
          {
            "value": 10001,
            "description": "sum of:",
            "details": [
              {
                "value": 0.0041682906,
                "description": "weight(doc.blkd:`\b\u0000\u0000\u0000\u0000 in 0) [PerFieldSimilarity], result of:",
                "details": [
                  {
                    "value": 0.0041682906,
                    "description": "score(doc=0,freq=1.0), product of:",
                    "details": [
                      {
                        "value": 0.0020365636,
                        "description": "queryWeight, product of:",
                        "details": [
                          {
                            "value": 2.0467274,
                            "description": "idf(docFreq=177, maxDocs=507)",
                            "details": []
                          },
                          {
                            "value": 0.0009950341,
                            "description": "queryNorm",
                            "details": []
                          }
                        ]
                      },
                      {
                        "value": 2.0467274,
                        "description": "fieldWeight in 0, product of:",
                        "details": [
                          {
                            "value": 1,
                            "description": "tf(freq=1.0), with freq of:",
                            "details": [
                              {
                                "value": 1,
                                "description": "termFreq=1.0",
                                "details": []
                              }
                            ]
                          },
                          {
                            "value": 2.0467274,
                            "description": "idf(docFreq=177, maxDocs=507)",
                            "details": []
                          },
                          {
                            "value": 1,
                            "description": "fieldNorm(doc=0)",
                            "details": []
                          }
                        ]
                      }
                    ]
                  }
                ]
              },
              {
                "value": 10000.001,
                "description": "sum of",
                "details": [
                  {
                    "value": 0.0009950341,
                    "description": "*:*, product of:",
                    "details": [
                      {
                        "value": 1,
                        "description": "boost",
                        "details": []
                      },
                      {
                        "value": 0.0009950341,
                        "description": "queryNorm",
                        "details": []
                      }
                    ]
                  },
                  {
                    "value": 10000,
                    "description": "min of:",
                    "details": [
                      {
                        "value": 10000,
                        "description": "field value function: none(doc['doc.featured'].value * factor=10000.0)",
                        "details": []
                      },
                      {
                        "value": 3.4028235e+38,
                        "description": "maxBoost",
                        "details": []
                      }
                    ]
                  }
                ]
              },
              {
                "value": 0.9950341,
                "description": "ConstantScore(doc.name:*featured*), product of:",
                "details": [
                  {
                    "value": 1000,
                    "description": "boost",
                    "details": []
                  },
                  {
                    "value": 0.0009950341,
                    "description": "queryNorm",
                    "details": []
                  }
                ]
              }
            ]
          },
          {
            "value": 0,
            "description": "match on required clause, product of:",
            "details": [
              {
                "value": 0,
                "description": "# clause",
                "details": []
              },
              {
                "value": 0.0009950341,
                "description": "((doc.name:*featured*)^1000000.0 | (doc.tags:*featured*)^10.0 | (doc.tagln:*featured*)^10.0), product of:",
                "details": [
                  {
                    "value": 1,
                    "description": "boost",
                    "details": []
                  },
                  {
                    "value": 0.0009950341,
                    "description": "queryNorm",
                    "details": []
                  }
                ]
              }
            ]
          }
        ]
      }
    }
  ]
}
}
    }
  },

似乎唯一影响得分的query_string是过滤器中的那个,但我似乎无法提升它的分数...... 欢迎任何提示:)谢谢

2 个答案:

答案 0 :(得分:1)

对于OP中的查询,您需要在bool查询中启用disable_coord以获得所需的行为。

同时启用inner_hits并在其中设置explain:true将提供嵌套文档的评分详细信息。此功能在elasticsearch 1.5及更新版本中可用。

示例:

{
   "query": {
      "nested": {
         "query": {
            "filtered": {
               "query": {
                  "bool": {
                      "disable_coord": "true",
                     "must": [
                        {
                           "term": {
                              "doc.blkd": 0
                           }
                        }
                     ],
                     "should": [
                        {
                           "function_score": {
                              "functions": [
                                 {
                                    "field_value_factor": {
                                       "field": "doc.featured",
                                       "factor": 10000
                                    }
                                 }
                              ],
                              "score_mode": "sum",
                              "boost_mode": "sum"
                           }
                        },
                        {
                           "constant_score": {
                              "filter": {
                                 "query_string": {
                                    "query": "featured*",
                                    "fields": [
                                       "doc.name^1000.0"
                                    ]
                                 }
                              },
                              "boost": 1000
                           }
                        },
                        {
                           "constant_score": {
                              "filter": {
                                 "query_string": {
                                    "query": "featured*",
                                    "fields": [
                                       "doc.tags^10.0"
                                    ],
                                    "boost": 10
                                 }
                              }
                           }
                        },
                        {
                           "constant_score": {
                              "filter": {
                                 "query_string": {
                                    "query": "featured*",
                                    "fields": [
                                       "doc.tagln^10.0"
                                    ],
                                    "boost": 10
                                 }
                              }
                           }
                        }
                     ],
                     "minimum_should_match": "0"
                  }
               }
            }
         },
         "path": "doc",
         "score_mode": "sum",
         "inner_hits" : {
             "explain" : "true"
         }
      }
   }

}

EDITED

使用功能分数重写查询也可能更简单,如下例所示。

   {
       "query": {
          "nested": {
             "query": {
                "function_score": {
                   "functions": [
                      {
                         "field_value_factor": {
                            "field": "doc.featured",
                            "factor": 10000
                         }
                      },
                      {
                         "filter": {
                            "query_string": {
                               "query": "*featured*",
                               "fields": [
                                  "doc.name^1000.0"
                               ]
                            }
                         },
                         "weight": 1000
                      },
                      {
                         "filter": {
                            "query_string": {
                               "query": "*featured*",
                               "fields": [
                                  "doc.tags^1000.0"
                               ]
                            }
                         },
                         "weight": 100
                      },
                      {
                         "weight": 10,
                         "filter": {
                            "query_string": {
                               "query": "*featured*",
                               "fields": [
                                  "doc.tagln^10.0"
                               ]
                            }
                         }
                      }
                   ],
                   "query": {
                      "term": {
                         "doc.blkd": 0
                      }
                   },
                   "score_mode": "sum",
                   "boost_mode": "sum"
                }
             },
             "path": "doc",
             "score_mode": "sum",
             "inner_hits": {
                "explain": "true"
             }
          }    
   }
}

答案 1 :(得分:0)

" score_mode" :"总和",

" boost_mode" :"总和"

是我的问题.. ES正在将整个分数归一化,但是结果很奇怪,因为那样。

感谢inner_hits解释的安全性......它给了我很多帮助!