如何匹配Lucene搜索中的确切文本?

时间:2016-05-28 05:43:51

标签: java lucene

我尝试在 TITLE 列中匹配文本从ASA5505 8.2到ASA5516的配置迁移

我的程序看起来像这样。

Directory directory = FSDirectory.open(indexDir);

MultiFieldQueryParser queryParser = new MultiFieldQueryParser(Version.LUCENE_35,new String[] {"TITLE"}, new StandardAnalyzer(Version.LUCENE_35));        
IndexReader reader = IndexReader.open(directory);
IndexSearcher searcher = new IndexSearcher(reader);       
queryParser.setPhraseSlop(0);
queryParser.setLowercaseExpandedTerms(true);
Query query = queryParser.parse("TITLE:Config migration from ASA5505 8.2 to ASA5516");
System.out.println(queryStr);
TopDocs topDocs = searcher.search(query,100);
System.out.println(topDocs.totalHits);
ScoreDoc[] hits = topDocs.scoreDocs;
System.out.println(hits.length + " Record(s) Found");
for (int i = 0; i < hits.length; i++) {
    int docId = hits[i].doc;
    Document d = searcher.doc(docId);
    System.out.println("\"Title :\" " +d.get("TITLE") );
}

但它的回归

"Title :" Config migration from ASA5505 8.2 to ASA5516
"Title :" Firewall  migration from ASA5585 to  ASA5555
"Title :" Firewall  migration from ASA5585 to  ASA5555

第二个2结果是不可预期的。那么需要进行哪些修改才能匹配准确的文本从ASA5505 8.2到ASA5516的配置迁移

我的索引功能看起来像这样

public class Lucene {
public static final String INDEX_DIR = "./Lucene";
private static final String JDBC_DRIVER = "oracle.jdbc.OracleDriver";
private static final String CONNECTION_URL = "jdbc:oracle:thin:xxxxxxx"

private static final String USER_NAME = "localhost";
private static final String PASSWORD = "localhost";
private static final String QUERY = "select * from TITLE_TABLE";

public static void main(String[] args) throws Exception {
    File indexDir = new File(INDEX_DIR);
    Lucene indexer = new Lucene();
    try {
        Date start = new Date();
        Class.forName(JDBC_DRIVER).newInstance();
        Connection conn = DriverManager.getConnection(CONNECTION_URL, USER_NAME, PASSWORD);
        SimpleAnalyzer analyzer = new SimpleAnalyzer(Version.LUCENE_35);
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LUCENE_35, analyzer);
        IndexWriter indexWriter = new IndexWriter(FSDirectory.open(indexDir), indexWriterConfig);
        System.out.println("Indexing to directory '" + indexDir + "'...");
        int indexedDocumentCount = indexer.indexDocs(indexWriter, conn);
        indexWriter.close();
        System.out.println(indexedDocumentCount + " records have been indexed successfully");
        System.out.println("Total Time:" + (new Date().getTime() - start.getTime()) / (1000));
    } catch (Exception e) {
        e.printStackTrace();
    }
}

int indexDocs(IndexWriter writer, Connection conn) throws Exception {
    String sql = QUERY;
    Statement stmt = conn.createStatement();
    stmt.setFetchSize(100000);
    ResultSet rs = stmt.executeQuery(sql);
    int i = 0;
    while (rs.next()) {
        System.out.println("Addind Doc No:" + i);
        Document d = new Document();
        System.out.println(rs.getString("TITLE"));
        d.add(new Field("TITLE", rs.getString("TITLE"), Field.Store.YES, Field.Index.ANALYZED));
        d.add(new Field("NAME", rs.getString("NAME"), Field.Store.YES, Field.Index.ANALYZED));
        writer.addDocument(d);
        i++;
    }
    return i;
}
}

3 个答案:

答案 0 :(得分:0)

尝试PhraseQuery如下:

BooleanQuery mainQuery= new BooleanQuery(); 
String searchTerm="config migration from asa5505 8.2 to asa5516";
String strArray[]= searchTerm.split(" ");
for(int index=0;index<strArray.length;index++)
{
    PhraseQuery query1 = new PhraseQuery();
     query1.add(new Term("TITLE",strArray[index]));
     mainQuery.add(query1,BooleanClause.Occur.MUST);
}

然后执行mainQuery

查看stackoverflow的this主题,它可以帮助您使用PhraseQuery进行精确搜索。

答案 1 :(得分:0)

PVR是正确的,使用短语查询可能是正确的解决方案,但他们错过了如何使用PhraseQuery类。您已经在使用QueryParser了,所以只需使用引号中的搜索文本封闭查询解析器语法:

Query query = queryParser.parse("TITLE:\"Config migration from ASA5505 8.2 to ASA5516\"");

根据您的更新,您在索引时和查询时使用不同的分析器。 SimpleAnalyzerStandardAnalyzer不做同样的事情。除非您有充分的理由不这样做,否则在索引和查询时应该以相同的方式进行分析。

因此,请将索引代码中的分析器更改为StandardAnalyzer(反之亦然,查询时请使用SimpleAnalyzer),您应该会看到更好的结果。

答案 2 :(得分:0)

以下是我为你所写的完美作品:

使用:int ret; fd_set set; struct timeval timeout; /* Initialize the file descriptor set. */ FD_ZERO(&set); FD_SET(recvFD, &set); /* Initialize the timeout data structure. */ timeout.tv_sec = 30; timeout.tv_usec = 0; /* select returns 0 if timeout, 1 if input available, -1 if error. */ ret = select(recvFD+1, &set, NULL, NULL, &timeout)); if (ret == 1) { num_bytes_received = recv(recvFD, line, MAX_LINE_SIZE-1, 0); if(line[0] == 'R') { do_something(); } if(line[0] == 'P') { do_another_thing(); } } else if (ret == 0) { /* timeout */ do_another_thing(); } else { /* error handling */ }

  1. 创建索引

    queryParser.parse("\"Config migration from ASA5505 8.2 to ASA5516\"");

    }

  2. 2.搜索字符串

    public static void main(String[] args) 
    {
    
        IndexWriter writer = getIndexWriter();
        Document doc = new Document();
        Document doc1 = new Document();
        Document doc2 = new Document();
        doc.add(new Field("TITLE", "Config migration from ASA5505 8.2 to ASA5516",Field.Store.YES,Field.Index.ANALYZED));
        doc1.add(new Field("TITLE", "Firewall  migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED));
        doc2.add(new Field("TITLE", "Firewall  migration from ASA5585 to ASA5555",Field.Store.YES,Field.Index.ANALYZED));
        try 
        {
            writer.addDocument(doc);
            writer.addDocument(doc1);
            writer.addDocument(doc2);
            writer.close();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
    }
    
    public static IndexWriter getIndexWriter()
    {
        IndexWriter indexWriter=null;
    
        try 
        {
        File file=new File("D://index//");
        if(!file.exists())
            file.mkdir();
        IndexWriterConfig conf=new IndexWriterConfig(Version.LUCENE_34, new StandardAnalyzer(Version.LUCENE_34));
        Directory directory=FSDirectory.open(file);
        indexWriter=new IndexWriter(directory, conf);
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        return indexWriter;
    }