最低Solr分数可纳入结果?

时间:2018-11-15 14:56:27

标签: solr

我使用所有默认的Solr(7.5)设置创建了医学术语的集合。这些文档来自CSV文件,我将bin/post用于默认设置。

当我提交一个愚蠢的查询时,我可能无法获得所请求的行数。

http://host/solr/collection/select?fl=anyLabel,score&q=anyLabel:(astronaut%20%20football%20felafel)&rows=9999&wt=csv

有一些分数阈值吗?在这种情况下,最低分数是〜8。我已经运行了其他不那么傻的查询,这些查询将合理的结果返回到分数2或3。

为什么该结果在得分为8后被截断?我对此有任何控制权吗?

anyLabel,score football,16.0328 astronaut haemolytic anaemia,15.470738 astronaut hemolytic anemia,15.470738 canadian football,14.440538 american football,14.440538 football field,14.440538 astronaut-bone demineralization syndrome,14.188901 indoor football arena,13.135968 australian rules football,13.135968 canadian football - sport,13.135968 american football - sport,13.135968 aussie rules football,13.135968 indoor football court,13.135968 astronaut-bone demineralization syndrome (disorder),13.103226 australian rules football ground,12.04758 indoor football arena (environment),12.04758 indoor american football arena,12.04758 american or canadian football,12.04758 american or canadian football field,11.12575 accidentally kicked during football game,11.12575 australian rules football ground (environment),11.12575 canadian football - sport (qualifier value),11.12575 american or canadian football - sport,11.12575 american football - sport (qualifier value),11.12575 australian rules football (qualifier value),11.12575 "american or canadian football\, device",11.12575 accidentally stepped on during football game,10.334962 american or canadian football field (environment),10.334962 accidentally kicked during football game (event),10.334962 american or canadian football - sport (qualifier value),9.649129 "american or canadian football\, device (physical object)",9.649129 accidentally stepped on during football game (event),9.649129 "place of occurrence of accident or poisoning\, football field",8.518538 "place of occurrence of accident or poisoning\, football field (environment)",8.047099

1 个答案:

答案 0 :(得分:2)

没有最低分数-高于0的任何内容在某种程度上都被认为是匹配项,只要rowsstart参数与{{ 1}}值。

一般而言,请求之间的分数是不可比的,并且将分数外推为“一个文件的一半是另一个文件的相关性的50%”也没有道理。

分数还取决于所使用的相似性算法,在Solr版本之间,相似性可能会有所不同。对于7.5,这是BM25相似度。