Solr关键词排名检测问题

时间:2017-12-05 06:54:26

标签: solr

我使用的是Solr 6.1,而我搜索“c0020673”得到三个结果,

id:“id-62”包含其他结果数据,

所以我认为结果得分将是“id-62”> “id-01”> “ID-87”

但事实并非如此,任何人都可以问我为什么“id-62”得分会很小?以及如何解决这个问题?

这三个导致三个不同的集合。

"id-01": "\n1692.4559 = sum of:\n  1692.4559 = max of:\n    
1692.4559 = weight(master_key:\"c0020673\" in 18404) [], result of:\n      
1692.4559 = score(doc=18404,freq=1.0 = phraseFreq=1.0\n), product of:\n        
130.0 = boost\n        
13.648735 = idf(), sum of:\n          
1.6337854E-6 = idf(docFreq=306037, docCount=306037)\n          
1.6337854E-6 = idf(docFreq=306037, docCount=306037)\n          
8.4502326E-4 = idf(docFreq=305779, docCount=306037)\n          
1.9327133 = idf(docFreq=44300, docCount=306037)\n          
11.715174 = idf(docFreq=2, docCount=306037)\n        
0.95385337 = tfNorm, computed from:\n          
1.0 = phraseFreq=1.0\n          
1.2 = parameter k1\n          
0.75 = parameter b\n          
4.671981 = avgFieldLength\n          
5.2244897 = fieldLength\n    
76.86601 = weight(text_to_cjk:\"c0020673\" in 18404) [], result of:\n      
76.86601 = score(doc=18404,freq=2.0 = phraseFreq=2.0\n), product of:\n        
5.0 = boost\n        
11.1805105 = idf(), sum of:\n          
1.6337854E-6 = idf(docFreq=306037, docCount=306037)\n          
1.6337854E-6 = idf(docFreq=306037, docCount=306037)\n          
8.4502326E-4 = idf(docFreq=305779, docCount=306037)\n          
1.5686225 = idf(docFreq=63757, docCount=306037)\n          
9.61104 = idf(docFreq=20, docCount=306037)\n        
1.375 = tfNorm, computed from:\n          
2.0 = phraseFreq=2.0\n          
1.2 = parameter k1\n          
0.0 = parameter b (norms omitted for field)\n    
77.91596 = weight(text_to_jp:\"1 gdw cust as c 0020673\" in 18404) [], result of:\n      
77.91596 = score(doc=18404,freq=2.0 = phraseFreq=2.0\n), product of:\n        
5.0 = boost\n        
11.333231 = idf(), sum of:\n          
1.6337854E-6 = idf(docFreq=306037, docCount=306037)\n          
1.6337854E-6 = idf(docFreq=306037, docCount=306037)\n          
8.4502326E-4 = idf(docFreq=305779, docCount=306037)\n          
1.5158352 = idf(docFreq=67213, docCount=306037)\n          
0.342083 = idf(docFreq=217375, docCount=306037)\n          
9.474464 = idf(docFreq=23, docCount=306037)\n        
1.375 = tfNorm, computed from:\n          
2.0 = phraseFreq=2.0\n          
1.2 = parameter k1\n          
0.0 = parameter b (norms omitted for field)\n    
782.6664 = weight(content:\"c0020673\" in 18404) [], result of:\n      
782.6664 = score(doc=18404,freq=1.0 = phraseFreq=1.0\n), product of:\n        
70.0 = boost\n        
11.180948 = idf(), sum of:\n         
1.1436554E-5 = idf(docFreq=306034, docCount=306037)\n          
1.1436554E-5 = idf(docFreq=306034, docCount=306037)\n          
8.5483433E-4 = idf(docFreq=305776, docCount=306037)\n          
1.5690303 = idf(docFreq=63731, docCount=306037)\n          
9.61104 = idf(docFreq=20, docCount=306037)\n        
1.0 = tfNorm, computed from:\n          
1.0 = phraseFreq=1.0\n          
1.2 = parameter k1\n          
0.0 = parameter b (norms omitted for field)\n",

"id-87": "\n1705.65 = sum of:\n  
1705.65 = max of:\n    
1705.65 = weight(master_key:\"c0020673\" in 0) [], result of:\n      
1705.65 = score(doc=0,freq=1.0 = phraseFreq=1.0\n), product of:\n        
130.0 = boost\n        
14.5187435 = idf(), sum of:\n          
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n          
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n          
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n          
5.4025307 = idf(docFreq=61, docCount=13650)\n          
9.116103 = idf(docFreq=1, docCount=13650)\n       
0.903686 = tfNorm, computed from:\n         
1.0 = phraseFreq=1.0\n         
1.2 = parameter k1\n        
0.75 = parameter b\n        
4.1446886 = avgFieldLength\n  
5.2244897 = fieldLength\n   
90.3841 = weight(text_to_cjk:\"c0020673\" in 0) [], result of:\n 
90.3841 = score(doc=0,freq=2.0 = phraseFreq=2.0\n), product of:\n    
5.0 = boost\n   
13.146779 = idf(), sum of:\n    
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n      
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n       
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n     
4.030566 = idf(docFreq=242, docCount=13650)\n      
9.116103 = idf(docFreq=1, docCount=13650)\n     
1.375 = tfNorm, computed from:\n       
2.0 = phraseFreq=2.0\n   
1.2 = parameter k1\n        
0.0 = parameter b (norms omitted for field)\n  
89.17496 = weight(text_to_jp:\"1 gdw cust as c 0020673\" in 0) [], result of:\n 
89.17496 = score(doc=0,freq=2.0 = phraseFreq=2.0\n), product of:\n  
5.0 = boost\n  
12.970903 = idf(), sum of:\n    
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n      
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n     
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n   
3.8161204 = idf(docFreq=300, docCount=13650)\n       
0.038570423 = idf(docFreq=13134, docCount=13650)\n  
9.116103 = idf(docFreq=1, docCount=13650)\n   
1.375 = tfNorm, computed from:\n    
2.0 = phraseFreq=2.0\n     
1.2 = parameter k1\n       
0.0 = parameter b (norms omitted for field)\n    
920.27454 = weight(content:\"c0020673\" in 0) [], result of:\n    
920.27454 = score(doc=0,freq=1.0 = phraseFreq=1.0\n), product of:\n      
70.0 = boost\n    
13.146779 = idf(), sum of:\n     
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n        
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n       
3.6628026E-5 = idf(docFreq=13650, docCount=13650)\n     
4.030566 = idf(docFreq=242, docCount=13650)\n       
9.116103 = idf(docFreq=1, docCount=13650)\n      
1.0 = tfNorm, computed from:\n        
1.0 = phraseFreq=1.0\n        
1.2 = parameter k1\n       
0.0 = parameter b (norms omitted for field)\n",

"id-62": "\n1361.2384 = sum of:\n  
1361.2384 = max of:\n    
1361.2384 = weight(master_key:\"c0020673\" in 0) [], result of:\n      
1361.2384 = score(doc=0,freq=1.0 = phraseFreq=1.0\n), product of:\n        
130.0 = boost\n        
10.671043 = idf(), sum of:\n          
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n          
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n      
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n     
3.4783592 = idf(docFreq=61, docCount=1992)\n     
7.1919312 = idf(docFreq=1, docCount=1992)\n    
0.9812597 = tfNorm, computed from:\n         
1.0 = phraseFreq=1.0\n        
1.2 = parameter k1\n        
0.75 = parameter b\n        
4.991466 = avgFieldLength\n  
5.2244897 = fieldLength\n   
70.97167 = weight(text_to_cjk:\"c0020673\" in 0) [], result of:\n 
70.97167 = score(doc=0,freq=3.0 = phraseFreq=3.0\n), product of:\n 
5.0 = boost\n   
9.032757 = idf(), sum of:\n      
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n  
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n  
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n   
1.8400731 = idf(docFreq=316, docCount=1992)\n     
7.1919312 = idf(docFreq=1, docCount=1992)\n    
1.5714288 = tfNorm, computed from:\n        
3.0 = phraseFreq=3.0\n       
1.2 = parameter k1\n       
0.0 = parameter b (norms omitted for field)\n  
70.1351 = weight(text_to_jp:\"1 gdw cust as c 0020673\" in 0) [], result of:\n     
70.1351 = score(doc=0,freq=3.0 = phraseFreq=3.0\n), product of:\n    
5.0 = boost\n    
8.926285 = idf(), sum of:\n   
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n   
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n 
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n 
1.7323457 = idf(docFreq=352, docCount=1992)\n     
0.0012551778 = idf(docFreq=1990, docCount=1992)\n   
7.1919312 = idf(docFreq=1, docCount=1992)\n       
1.5714288 = tfNorm, computed from:\n          
3.0 = phraseFreq=3.0\n       
1.2 = parameter k1\n         
0.0 = parameter b (norms omitted for field)\n  
869.40283 = weight(content:\"c0020673\" in 0) [], result of:\n  
869.40283 = score(doc=0,freq=2.0 = phraseFreq=2.0\n), product of:\n 
70.0 = boost\n      
9.032757 = idf(), sum of:\n      
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n      
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n     
2.5090954E-4 = idf(docFreq=1992, docCount=1992)\n    
1.8400731 = idf(docFreq=316, docCount=1992)\n      
7.1919312 = idf(docFreq=1, docCount=1992)\n        
1.375 = tfNorm, computed from:\n       
2.0 = phraseFreq=2.0\n        
1.2 = parameter k1\n         
0.0 = parameter b (norms omitted for field)\n"

1 个答案:

答案 0 :(得分:0)

从调试中我发现你必须配置一个qf,如:

master_key ^ 130.0 text_to_cjk ^ 5.0 text_to_jp ^ 5.0 content ^ 70.0

也许你有一些额外的短语提升,你正在使用dimax / edismax请求处理程序。 您还使用了一个价值为" 0.0"的平局[1]。 - 默认值 - 这使得查询成为纯粹的"析取最大查询":也就是说,只有最大得分子查询才会对最终得分做出贡献。 因此,所有3个分数都由 master_key 匹配支配。

3个分数显示avgFieldLength和IDF值的差异,这表明您可能处于SolrCloud场景中,您没有使用分布式IDF [2]。

我是对的吗?你为什么期望这个订购?

" ID-62"> " ID-01" > " ID-87"

如果原因是内容字段中的匹配项,则需要为tiebreaker参数使用不同的值。 读一下wiki,但为了简单起见,值为" 1.0"使查询成为纯粹的"析取和查询"最大得分子查询是什么并不重要,因为最终得分将是子查询得分的总和。

[1] https://lucene.apache.org/solr/guide/6_6/the-dismax-query-parser.html#TheDisMaxQueryParser-Thetie_TieBreaker_Parameter