弹性搜索更像是5.x中的查询分数问题

时间:2017-07-12 11:46:46

标签: elasticsearch lucene nest

最近我们已将 Elasticsearch 版本从 2.4更改为5.4

我们发现一个问题更像是5.x版中的此查询。

以下查询用于按文字

查找类似文件
  

INPUT查询

POST /test/_search
{
  "size": 10000,
"stored_fields": [
"docid"
],
 "_source": false,
"query": {
"more_like_this": {
"fields": [
    "textcontent"
  ],
  "like": [
    {
      "_index": "test",
      "_type": "object",
      "_id": "AV0c9jvZXF-b5U5aNAWB"
    }
  ],
  "max_query_terms": 5000,
  "min_term_freq": 1,
  "min_doc_freq": 1
}
}
}
  

Elasticsearch 2.4的输出

{

"took": 16,
"timed_out": false,
"_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
},
"hits": {
    "total": 3,
    "max_score": 1.5381224,
    "hits": [
        {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z9",
            "_score": 1.5381224,
            "fields": {
                "docid": [
                    "2"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal63Z",
            "_score": .5381224,
            "fields": {
                "docid": [
                    "3"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z",
            "_score": .381224,
            "fields": {
                "docid": [
                    "4"
                ]
            }
        }
  

Elasticsearch 5.4的输出       {

"took": 16,
"timed_out": false,
"_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
},
"hits": {
    "total": 3,
    "max_score": 1.5381224,
    "hits": [
        {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z9",
            "_score": 168.5381224,
            "fields": {
                "docid": [
                    "2"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal63Z",
            "_score": 164.5381224,
            "fields": {
                "docid": [
                    "3"
                ]
            }
        },  {
            "_index": "test",
            "_type": "object",
            "_id": "AVzjOOdilllQ-Gyal6Z",
            "_score": 132.381224,
            "fields": {
                "docid": [
                    "4"
                ]
            }
        }}

两个版本的输出相同,但文档的分数除外。 版本5.4的得分高于2.4。 我们依赖于我们工作的分数,所以如果分数发生变化则对我们来说是一个问题。请为此提供解决方案?

1 个答案:

答案 0 :(得分:3)

我得到了解决方案,在5.0版本中,他们已经将默认相似度算法从经典改为BM25,这就是它的原因。 只需在创建索引时将相似性类型更改为经典。 和 如果索引已经存在,那么只需通过执行以下查询来更新所有索引的设置

PUT /_all/_settings?preserve_existing=true          
{
  "index.similarity.default.type": "classic"
}