Lucene 3.6.2嵌入了如何通过字段值限制返回的文档

时间:2014-11-17 16:54:39

标签: lucene

假设我将特定照片的评论编入索引,如下所示。

IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_36, analyzer);
IndexWriter indexWriter = new IndexWriter(indexDir, config);

Document doc1 = new Document()
doc1.addField(new Field("photoId", "12345.jpg", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc1.addField(new Field("body", "photo of cats skating", Field.Store.YES, Field.Index.ANALYZED));

Document doc2 = new Document()
doc2.addField(new Field("photoId", "12345.jpg", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc2.addField(new Field("body", "skating cats are fun to look at", Field.Store.YES, Field.Index.ANALYZED));

Document doc3 = new Document()
doc3.addField(new Field("photoId", "6789.jpg", Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc3.addField(new Field("name", "two dogs skating like pros", Field.Store.YES, Field.Index.ANALYZED));

indexWriter.addDocuments(Arrays.asList(new Document[]{doc1, doc2, doc3}));

我想在评论中查询并根据正文内容返回照片。 如果我查询skating dogs and cats,则返回所有三个文档。 我想要的是返回doc3doc1 or doc2。这是基于字段photoId的值返回唯一文档。一旦12345.jpg匹配一个忽略其余的因为我们只想要照片。我该如何做到这一点?

我的搜索基本上就是这样

    String[] fields = {"body", "any_other_relevant_field"};
    Query query = new MultiFieldQueryParser(Version.LUCENE_36, fields, analyzer).parse("skating dogs and cats");
TopScoreDocCollector collector = TopScoreDocCollector.create(10, true);
    IndexSearcher searcher = searcherManager.acquire();
    searcher.search(query, null, collector);
    ScoreDoc[] hits = collector.topDocs().scoreDocs;
//   The rest seems to be the normal yada yada

1 个答案:

答案 0 :(得分:0)

您可以构建索引,以便每个文档对应一张照片。在每种情况下,您必须在文档中添加“正文”字段:

Document doc1 = new Document()
doc1.addField(new Field("photoId", "12345.jpg", 
                        Field.Store.YES, Field.Index.NOT_ANALYZED_NO_NORMS));
doc1.addField(new Field("body", "photo of cats skating", 
                        Field.Store.YES, Field.Index.ANALYZED));
doc1.addField(new Field("body", "skating cats are fun to look at", 
                        Field.Store.YES, Field.Index.ANALYZED));

在文档中使用多个具有相同名称的字段绝对没问题。实际上它几乎与在构建'body'字段值之前连接这些字符串相同。

明显的缺点是,如果没有满足查询的注释,但是你在结果中得到了那张照片,因为它的连接注释满足了那个查询,你就会得到奇怪的结果。

相关问题