Question

我从弹性搜索中搜索到一些数据，因为与MongoDB相比，它提供了更好的全文本搜索。但是我面临一些问题，其中之一是：

我的数据保存在elasticsearch中，例如：

[{
   "word": "tidak berpuas hati",
   "type": "NEGATIVE",
   "score": -0.3908697916666666
  },{
   "word": "berpuas hati",
   "type": "POSITIVE",
   "score": 0.65375
  },{
   "word": "hati",
   "type": "POSITIVE",
   "score": 0.6
  },{
   "word": "tidak",
   "type": "NEGATIVE",
   "score": 0.6
}]

但是当我在此数据中搜索saya tidak berpuas hati句子时。我得到这样的回复：

"hits": [
 {
    "_index": "sentiment",
    "_type": "ms",
    "_id": "8SPiimYBKsyQt_Jg1VYa",
    "_score": 8.838576,
    "_source": {
       "word": "berpuas hati",
       "type": "POSITIVE",
       "score": 0.65375
    },
    "highlight": {
       "word": [
          "<em>berpuas</em> <em>hati</em>"
       ]
    }
 },
 {
    "_index": "sentiment",
    "_type": "ms",
    "_id": "PiPiimYBKsyQt_Jg1U4U",
    "_score": 8.774891,
    "_source": {
       "word": "tidak berpuas hati",
       "type": "NEGATIVE",
       "score": -0.3908697916666666
    },
    "highlight": {
       "word": [
          "<em>tidak</em> <em>berpuas</em> <em>hati</em>"
       ]
    }
 },
 {
    "_index": "sentiment",
    "_type": "ms",
    "_id": "ByPiimYBKsyQt_Jg1VUZ",
    "_score": 5.045017,
    "_source": {
       "word": "hati",
       "type": "POSITIVE",
       "score": 0.6
    },
    "highlight": {
       "word": [
          "<em>hati</em>"
       ]
    }
  }
]

这是我的查询：

query = {
            "from": 0,
            "size": 20,
            "query": {
                "match": {
                    "word": {
                        "query": term,
                        "operator": 'or',
                        "fuzziness": 'auto'
                    }
                }
            },
            "highlight": {
                "fields": {
                    "word": {}
                }
            }
        }

所以这里的问题是我不明白为什么tidak berpuas hati的得分不能超过berpuas hati。当我将from的值更改为1时，它将开始对此句子起作用，而对单个单词句子停止。

Answer 1

Elasticsearch分数是按每个碎片计算的。

在这种情况下，使用berpuas hati的文档比使用tidak berpus hati的文档在分片中的相关性更高，因此返回的分数更高。

Elasticsearch的相关性是由多个因素决定的，尽管在这里我要说的原因是因为tidak berpuas hati分片中有更多文档包含一个（或多个）术语{{1} } tidak或berpuas，而不是hati分片中。巧合。

如果您对仅包含这两个文档的索引进行相同的查询，您会发现berpuas hati的得分约为0.5，而berpuas hati的得分约为0.75

通过在查询中添加tidak berpuas hati，您可以找到有关得分得分的解释。评分算法在这里进行了说明：https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html

您可能还想阅读以下内容：https://www.elastic.co/guide/en/elasticsearch/guide/current/relevance-is-broken.html

为什么在Elasticsearch的全文搜索中，比完全不匹配的匹配项得分更低？

1 个答案: