Lucene 6.2.0中的短语查询

时间:2017-01-11 12:05:15

标签: mongodb lucene

我有这样的文件:

{ 
    "_id" : ObjectId("586b723b4b9a835db416fa26"), 
    "name" : "test", 
    "countries" : {
        "country" : [
            {
                "name" : "russia iraq"
            }, 
            {
                "name" : "USA china"
            }
        ]
    }
}

在MongoDB中,我试图使用短语查询(Lucene 6.2.0)来检索它。我的代码看起来很简单:

StandardAnalyzer analyzer = new StandardAnalyzer();         

         // 1. create the index
            Directory index = new RAMDirectory();
            IndexWriterConfig config = new IndexWriterConfig(analyzer); 
            try {       

                 IndexWriter w = new IndexWriter(index, config);                    
                MongoClient client = new MongoClient("localhost", 27017);
                DB database = client.getDB("test123");
                DBCollection coll =  database.getCollection("test1");
                //MongoCollection<org.bson.Document> collection = database.getCollection("test1");
            DBCursor cursor = coll.find();                  
                    System.out.println(cursor);
                 while (cursor.hasNext()) { 
                     BasicDBObject obj = (BasicDBObject) cursor.next();

                      Document doc = new Document();
                BasicDBObject f = (BasicDBObject) (obj.get("countries"));
                                List<BasicDBObject> dts = (List<BasicDBObject>)(f.get("country"));   
                     doc.add(new TextField("id",obj.get("_id").toString().toLowerCase(), Field.Store.YES));
                     doc.add(new StringField("name",obj.get("name").toString(), Field.Store.YES));  
                   doc.add(new StringField("countries",f.toString(), Field.Store.YES));

                   for(BasicDBObject d : dts){
                      doc.add(new StringField("country",d.get("name").toString(), Field.Store.YES));
    //               
               }
                    w.addDocument(doc);                    

                 }
                 w.close();

,我的搜索结果如下:

 PhraseQuery query = new PhraseQuery("country", "iraq russia" );


                   // 3. search
                   int hitsPerPage = 10;
                   IndexReader reader = DirectoryReader.open(index);

                   IndexSearcher searcher = new IndexSearcher(reader);
                   TopDocs docs = searcher.search(query, hitsPerPage);
                   ScoreDoc[] hits = docs.scoreDocs;

                  //  4. display results
                   System.out.println("Found " + hits.length + " hits.");
                   for(int j=0;j<hits.length;++j) {
                       int docId = hits[j].doc;
                       Document d = searcher.doc(docId);
                       System.out.println(d);
                   }


                   reader.close();
    }
    catch (Exception e) {
            e.printStackTrace();
        } 

我对此查询的命中率为零。谁能说出我做错了什么? 使用的罐子: Lucene的-queries4.2.0 的Lucene的QueryParser-6.2.1 lucene的-分析器-共6.2.0

2 个答案:

答案 0 :(得分:0)

首先,永远不要混合Lucene版本。你的所有罐子都应该是同一个版本。将lucene-queries升级到6.2.1。在实践中,你可能会或可能不会遇到混淆6.2.0和6.2.1的问题,但你肯定应该升级lucene-analyzers-common。

PhraseQuery没有为您分析,您必须单独添加术语。在你的例子中,&#34;伊拉克俄罗斯&#34;被视为单个术语,而不是两个单独的(分析)术语。

看起来应该是这样的:

Query query = new PhraseQuery.Builder()
    .add(new Term("country", "iraq"))
    .add(new Term("country", "russia"))
    .build();

如果您想要为您分析的内容,可以使用QueryParser:

QueryParser parser = new QueryParser("country", new StandardAnalyzer())
Query query = queryparser.parse("\"iraq russia\"");

答案 1 :(得分:0)

我做了一些改变,如:

Query query = new PhraseQuery.Builder()
                        .add(new Term("country", "iraq"))
                        .add(new Term("country", "russia"))
                        .setSlop(2)
                        .build();

并且我还在索引时更改了feild的类型:

for(BasicDBObject d : dts){
                  doc.add(newTextField("country",d.get("name").toString(), Field.Store.YES));

           }

但有人能告诉我索引时StringFeild和TextFeild之间的区别吗?