关于Lucene得分的问题

时间:2009-08-04 12:44:11

标签: lucene

我对Lucene得分有疑问。我在索引中有两个文档,一个包含“我的名字”,另一个包含“我的名字”。当我搜索关键字“我的名字”时,第二个文档列在第一个文档的上方。我想要的是,如果文档包含我输入的确切关键字,则应首先列出,然后列出另一个。任何人都可以帮我如何做到这一点。感谢。

4 个答案:

答案 0 :(得分:3)

第二次尝试答案: Lucene的默认行为应该是你要求的。 这里的关键因素是得分的lengthNorm()部分 - 有时得分较长的文档低于较短的文档。有关上下文,请参阅Lucene's Similarity API。例如,如果两个命中的lengthNorm相同,则它们是任意排序的。

explain()功能将帮助您了解文档按原样评分的原因,而不是根据默认值。

我假设你使用的是BooleanQuery。如果您发布查询的确切方式,我可以说更多。 另请参阅Query Parser Syntax。 我希望这更接近商标。

答案 1 :(得分:0)

如果你从命令行使用lucli(下载最新的Lucene源代码并且它在contrib目录中),你可以使用“explain”命令让Lucene解释为什么它得分如此之高。

它会出现这样的事情:

---------------- 2得分:0.6089077 ---------------------

(等等你的文件)

Explanation:4.260467 = (MATCH) sum of:                                                                                                                                                                                                       
  0.59024054 = (MATCH) weight(description:warwick in 276780), product of:                                                                                                                                                                    
    0.05595057 = queryWeight(description:warwick), product of:                                                                                                                                                                               
      5.2746606 = idf(docFreq=13531, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    10.549321 = (MATCH) fieldWeight(description:warwick in 276780), product of:                                                                                                                                                              
      1.0 = tf(termFreq(description:warwick)=1)                                                                                                                                                                                              
      5.2746606 = idf(docFreq=13531, numDocs=843621)                                                                                                                                                                                         
      2.0 = fieldNorm(field=description, doc=276780)                                                                                                                                                                                         
  0.832554 = (MATCH) weight(keywords:warwick in 276780), product of:                                                                                                                                                                         
    0.066450186 = queryWeight(keywords:warwick), product of:                                                                                                                                                                                 
      6.264497 = idf(docFreq=5028, numDocs=843621)                                                                                                                                                                                           
      0.010607426 = queryNorm                                                                                                                                                                                                                
    12.528994 = (MATCH) fieldWeight(keywords:warwick in 276780), product of:                                                                                                                                                                 
      1.0 = tf(termFreq(keywords:warwick)=1)                                                                                                                                                                                                 
      6.264497 = idf(docFreq=5028, numDocs=843621)                                                                                                                                                                                           
      2.0 = fieldNorm(field=keywords, doc=276780)                                                                                                                                                                                            
  0.19180772 = (MATCH) weight(url:warwick in 276780), product of:                                                                                                                                                                            
    0.048220757 = queryWeight(url:warwick), product of:                                                                                                                                                                                      
      4.5459433 = idf(docFreq=28043, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    3.9777002 = (MATCH) fieldWeight(url:warwick in 276780), product of:                                                                                                                                                                      
      1.0 = tf(termFreq(url:warwick)=1)                                                                                                                                                                                                      
      4.5459433 = idf(docFreq=28043, numDocs=843621)                                                                                                                                                                                         
      0.875 = fieldNorm(field=url, doc=276780)                                                                                                                                                                                               
  0.023709858 = (MATCH) weight(content:warwick in 276780), product of:                                                                                                                                                                       
    0.03373665 = queryWeight(content:warwick), product of:                                                                                                                                                                                   
      3.1804748 = idf(docFreq=109863, numDocs=843621)                                                                                                                                                                                        
      0.010607426 = queryNorm                                                                                                                                                                                                                
    0.7027923 = (MATCH) fieldWeight(content:warwick in 276780), product of:                                                                                                                                                                  
      1.4142135 = tf(termFreq(content:warwick)=2)                                                                                                                                                                                            
      3.1804748 = idf(docFreq=109863, numDocs=843621)                                                                                                                                                                                        
      0.15625 = fieldNorm(field=content, doc=276780)                                                                                                                                                                                         
  0.46163678 = (MATCH) weight(siteDescription:warwick in 276780), product of:                                                                                                                                                                
    0.0494812 = queryWeight(siteDescription:warwick), product of:                                                                                                                                                                            
      4.6647696 = idf(docFreq=24901, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    9.329539 = (MATCH) fieldWeight(siteDescription:warwick in 276780), product of:                                                                                                                                                           
      1.0 = tf(termFreq(siteDescription:warwick)=1)                                                                                                                                                                                          
      4.6647696 = idf(docFreq=24901, numDocs=843621)                                                                                                                                                                                         
      2.0 = fieldNorm(field=siteDescription, doc=276780)                                                                                                                                                                                     
  0.96127754 = (MATCH) weight(siteUrl:warwick in 276780), product of:                                                                                                                                                                        
    0.10097861 = queryWeight(siteUrl:warwick), product of:                                                                                                                                                                                   
      9.519615 = idf(docFreq=193, numDocs=843621)                                                                                                                                                                                            
      0.010607426 = queryNorm                                                                                                                                                                                                                
    9.519615 = (MATCH) fieldWeight(siteUrl:warwick in 276780), product of:                                                                                                                                                                   
      1.0 = tf(termFreq(siteUrl:warwick)=1)                                                                                                                                                                                                  
      9.519615 = idf(docFreq=193, numDocs=843621)                                                                                                                                                                                            
      1.0 = fieldNorm(field=siteUrl, doc=276780)                                                                                                                                                                                             
  0.62917286 = (MATCH) weight(title:warwick in 276780), product of:                                                                                                                                                                          
    0.05776636 = queryWeight(title:warwick), product of:                                                                                                                                                                                     
      5.4458413 = idf(docFreq=11402, numDocs=843621)                                                                                                                                                                                         
      0.010607426 = queryNorm                                                                                                                                                                                                                
    10.891683 = (MATCH) fieldWeight(title:warwick in 276780), product of:                                                                                                                                                                    
      1.0 = tf(termFreq(title:warwick)=1)                                                                                                                                                                                                    
      5.4458413 = idf(docFreq=11402, numDocs=843621)                                                                                                                                                                                         
      2.0 = fieldNorm(field=title, doc=276780)                                                                                                                                                                                               
  0.57006776 = (MATCH) weight(second_title:warwick in 276780), product of:                                                                                                                                                                   
    0.05498614 = queryWeight(second_title:warwick), product of:                                                                                                                                                                              
      5.18374 = idf(docFreq=14819, numDocs=843621)                                                                                                                                                                                           
      0.010607426 = queryNorm                                                                                                                                                                                                                
    10.36748 = (MATCH) fieldWeight(second_title:warwick in 276780), product of:                                                                                                                                                              
      1.0 = tf(termFreq(second_title:warwick)=1)                                                                                                                                                                                             
      5.18374 = idf(docFreq=14819, numDocs=843621)                                                                                                                                                                                           
      2.0 = fieldNorm(field=second_title, doc=276780)    

(对不起,我只有一个很大的索引才能得到一个例子,而不是一个简单的例子!)

答案 2 :(得分:0)

我将按如下方式更改查询。

(my AND name) OR "my name"

此处,只要存在词组匹配,附加词组查询就会添加到乐谱中。如果文档具有“我的名字”作为内容,则短语查询将不会产生任何额外分数。但是包含“我的名字”内容的文档会有额外的分数并显示在顶部。

这里,我假设忽略长度归一化。

答案 3 :(得分:0)

我有类似的问题。我使用支持PhraseQuery的{​​{1}}解决了这个问题(术语在文档中的相对位置是令牌)。希望这会有所帮助。
查看更多:How can Lucene's scoring depend on relative position of query?