Lucene按分数排序然后修改日期

时间:2015-12-02 05:15:03

标签: lucene

我的文档中有三个字段

  1. 标题
  2. 内容
  3. 修改日期
  4. 因此,当我搜索term时,按score排序的结果提供

    现在,我想根据modifiedDate进一步对结果进行排序,即以相同的分数显示最近的文档。

    我尝试按分数排序,修改日期但不起作用。有人能指出我正确的方向吗?

2 个答案:

答案 0 :(得分:6)

这可以通过定义Sort:

来完成
Sort sort = new Sort(
    SortField.FIELD_SCORE, 
    new SortField("myDateField", SortField.Type.STRING));
indexSearcher.search(myQuery, numHits, sort);

这里有两个可能的问题:

  • 您应该确保您的日期以可搜索且可排序的形式编入索引。通常,实现此目的的最佳方法是使用DateTools转换它。

  • 用于排序的字段必须编入索引,并且应进行分析(例如,StringField)。由您决定是否存储。

因此添加日期字段可能类似于:

Field dateField = new StringField(
    "myDateField", 
    DateTools.DateToString(myDateInstance, DateTools.Resolution.MINUTE),
    Field.Store.YES);
document.add(dateField);

注意:您还可以使用Date.getTime()将日期索引为数字字段。我更喜欢DateTools字符串方法,因为它提供了一些更好的处理它们的工具,特别是在精度方面,但无论哪种方式都可以。

答案 1 :(得分:2)

您可以使用自定义收集器来解决此问题。它将按分数对结果进行排序,然后按时间戳排序。在此收集器中,您应检索第二次排序的时间戳值。见下面的课程

public class CustomCollector extends TopDocsCollector<ScoreDocWithTime> {

    ScoreDocWithTime pqTop;

    // prevents instantiation
    public CustomCollector(int numHits) {
        super(new HitQueueWithTime(numHits, true));
        // HitQueue implements getSentinelObject to return a ScoreDoc, so we know
        // that at this point top() is already initialized.
        pqTop = pq.top();
    }

    @Override
    public LeafCollector getLeafCollector(LeafReaderContext context)
            throws IOException {
        final int docBase = context.docBase;
        final NumericDocValues modifiedDate =
                DocValues.getNumeric(context.reader(), "modifiedDate");

        return new LeafCollector() {
            Scorer scorer;


            @Override
            public void setScorer(Scorer scorer) throws IOException {
                this.scorer = scorer;
            }

            @Override
            public void collect(int doc) throws IOException {
                float score = scorer.score();

                // This collector cannot handle these scores:
                assert score != Float.NEGATIVE_INFINITY;
                assert !Float.isNaN(score);

                totalHits++;
                if (score <= pqTop.score) {
                    // Since docs are returned in-order (i.e., increasing doc Id), a document
                    // with equal score to pqTop.score cannot compete since HitQueue favors
                    // documents with lower doc Ids. Therefore reject those docs too.
                    return;
                }
                pqTop.doc = doc + docBase;
                pqTop.score = score;
                pqTop.timestamp = modifiedDate.get(doc);
                pqTop = pq.updateTop();
            }

        };
    }

    @Override
    public boolean needsScores() {
        return true;
    }
}

另外,要进行第二次排序,您需要向ScoreDoc添加一个额外的字段

public class ScoreDocWithTime extends ScoreDoc {
    public long timestamp;

    public ScoreDocWithTime(long timestamp, int doc, float score) {
        super(doc, score);
        this.timestamp = timestamp;
    }

    public ScoreDocWithTime(long timestamp, int doc, float score, int shardIndex) {
        super(doc, score, shardIndex);
        this.timestamp = timestamp;
    }
}

并创建自定义优先级队列以支持此

public class HitQueueWithTime extends PriorityQueue<ScoreDocWithTime> {

    public HitQueueWithTime(int numHits, boolean b) {
        super(numHits, b);
    }

    @Override
    protected ScoreDocWithTime getSentinelObject() {
        return new ScoreDocWithTime(0, Integer.MAX_VALUE, Float.NEGATIVE_INFINITY);
    }

    @Override
    protected boolean lessThan(ScoreDocWithTime hitA, ScoreDocWithTime hitB) {
        if (hitA.score == hitB.score)
            return (hitA.timestamp == hitB.timestamp) ?
                    hitA.doc > hitB.doc :
                    hitA.timestamp < hitB.timestamp;
        else
            return hitA.score < hitB.score;

    }
}

在此之后,您可以根据需要搜索结果。见下面的例子

public class SearchTest {

    public static void main(String[] args) throws IOException {
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(new StandardAnalyzer());
        Directory directory = new RAMDirectory();
        IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);

        addDoc(indexWriter, "w1", 1000);
        addDoc(indexWriter, "w1", 3000);
        addDoc(indexWriter, "w1", 500);
        addDoc(indexWriter, "w1 w2", 1000);
        addDoc(indexWriter, "w1 w2", 3000);
        addDoc(indexWriter, "w1 w2", 2000);
        addDoc(indexWriter, "w1 w2", 5000);

        final IndexReader indexReader = DirectoryReader.open(indexWriter, false);
        IndexSearcher indexSearcher = new IndexSearcher(indexReader);
        BooleanQuery query = new BooleanQuery();
        query.add(new TermQuery(new Term("desc", "w1")), BooleanClause.Occur.SHOULD);
        query.add(new TermQuery(new Term("desc", "w2")), BooleanClause.Occur.SHOULD);

        CustomCollector results = new CustomCollector(100);
        indexSearcher.search(query, results);
        TopDocs search = results.topDocs();
        for (ScoreDoc sd : search.scoreDocs) {
            Document document = indexReader.document(sd.doc);
            System.out.println(document.getField("desc").stringValue() + " " + ((ScoreDocWithTime) sd).timestamp);
        }

    }

    private static void addDoc(IndexWriter indexWriter, String decs, long modifiedDate) throws IOException {
        Document doc = new Document();
        doc.add(new TextField("desc", decs, Field.Store.YES));
        doc.add(new LongField("modifiedDate", modifiedDate, Field.Store.YES));
        doc.add(new NumericDocValuesField("modifiedDate", modifiedDate));
        indexWriter.addDocument(doc);
    }
}

程序将输出以下结果

w1 w2 5000
w1 w2 3000
w1 w2 2000
w1 w2 1000
w1 3000
w1 1000
w1 500

P.S。这个解决方案适用于Lucene 5.1