Question

我正在尝试返回导致我的Lucene索引中出现命中的原始术语。例如，我的搜索字符串是“快速的棕色狐狸跳过懒狗”。 “狗”这个词在索引中有点像“狗狗皮带”“遛狗”。同样，'狐狸'也有像'狐狸手套''狡猾的loxi'。

因此，我想为用户打印原始的'快速棕色狐狸字符串，并突出显示点击（狗和狐狸）的条款。这是一些像Get matched terms in query这样使用解释方法的例子，但答案不是最后一英寸。我认为Lucene不会轻易做到这一点，我将不得不使用正则表达式。

Answer 1

我想出了一种生成字符串的方法，该字符串是具有突出显示的匹配项的原始用户文本。以通常的方式对索引查询原始用户文本。原始用户文本和结果将传递给“反向”查询方法。那是：原始用户文本将转换为基于内存的索引，并由原始结果查询。这与我们最初的做法相反。结果是结果中的常用单词与字符串相对应。这适用于我的索引，因为所有结果都是“严格”定义。

荧光笔用于在原始结果中找到常见词[..]周围的分隔符。正则表达式（？＆lt; = \ [\ n（。*？）（？= \]）用于使用分隔符删除单个找到的单词。

单个找到的单词和原始用户文本将传递给以下方法，该方法删除术语重复并突出显示用户原始字符串中的单词：

//remove found term duplicates and produce a single string with all the hits highlighted

private static void removeTermDuplicates（List textResult，String searchText）{

// to be the final modified string with all highlights
String strOutput = searchText;

// creating a hashset using the incoming list
Set<String> textSet = new LinkedHashSet<String>(textResult);
// remove all the elements from the list 
textResult.clear();
// add all the elements of the set to create a

// list of found terms without duplicates
textResult.addAll(textSet);

// add html elements to found terms
for(String term : textResult){
    replacementWord.add("<b>"+term+"</b>");
}
//put original term and the same term with highlights in a hash map
for(int i=0; i<replacementWord.size(); ++i) {
    oldAndNewTerms.put(textResult.get(i), replacementWord.get(i));

}

//use a hash map to modify the original string
for (String key : oldAndNewTerms.keySet()){       

      strOutput = strOutput.replace(key,oldAndNewTerms.get(key) );      }

System.out.println(strOutput);

}

希望这可以帮助将来的某个人。菲尔

如何返回在索引中获得命中的查询字词

1 个答案: