public class EarlyTerminatingSortingCollector extends Collector
Collector
that early terminates collection of documents on a
per-segment basis, if the segment was sorted according to the given
Sorter
.
NOTE: the Collector
detects sorted segments according to
SortingMergePolicy
, so it's best used in conjunction with it. Also,
it collects up to a specified num docs from each segment, and therefore is
mostly suitable for use in conjunction with collectors such as
TopDocsCollector
, and not e.g. TotalHitCountCollector
.
NOTE: If you wrap a TopDocsCollector
that sorts in the same
order as the index order, the returned TopDocsCollector.topDocs()
will be correct. However the total of hit count
will be underestimated since not all matching documents will have
been collected.
NOTE: This Collector
uses Sorter.getID()
to detect
whether a segment was sorted with the same Sorter
as the one given in
EarlyTerminatingSortingCollector(Collector, Sorter, int)
. This has
two implications:
Sorter.getID()
is not implemented correctly and returns
different identifiers for equivalent Sorter
s, this collector will not
detect sorted segments,IndexWriter
's
SortingMergePolicy
to sort according to another criterion and if both
the old and the new Sorter
s have the same identifier, this
Collector
will incorrectly detect sorted segments.Constructor and Description |
---|
EarlyTerminatingSortingCollector(Collector in,
Sorter sorter,
int numDocsToCollect)
Create a new
EarlyTerminatingSortingCollector instance. |
Modifier and Type | Method and Description |
---|---|
boolean |
acceptsDocsOutOfOrder()
Return
true if this collector does not
require the matching docIDs to be delivered in int sort
order (smallest to largest) to Collector.collect(int) . |
void |
collect(int doc)
Called once for every document matching a query, with the unbased document
number.
|
void |
setNextReader(AtomicReaderContext context)
Called before collecting from each
AtomicReaderContext . |
void |
setScorer(Scorer scorer)
Called before successive calls to
Collector.collect(int) . |
public EarlyTerminatingSortingCollector(Collector in, Sorter sorter, int numDocsToCollect)
EarlyTerminatingSortingCollector
instance.in
- the collector to wrapsorter
- the same sorter as the one which is used by IndexWriter
's
SortingMergePolicy
numDocsToCollect
- the number of documents to collect on each segment. When wrapping
a TopDocsCollector
, this number should be the number of
hits.public void setScorer(Scorer scorer) throws IOException
Collector
Collector.collect(int)
. Implementations
that need the score of the current document (passed-in to
Collector.collect(int)
), should save the passed-in Scorer and call
scorer.score() when needed.setScorer
in class Collector
IOException
public void collect(int doc) throws IOException
Collector
Note: The collection of the current segment can be terminated by throwing
a CollectionTerminatedException
. In this case, the last docs of the
current AtomicReaderContext
will be skipped and IndexSearcher
will swallow the exception and continue collection with the next leaf.
Note: This is called in an inner search loop. For good search performance,
implementations of this method should not call IndexSearcher.doc(int)
or
IndexReader.document(int)
on every hit.
Doing so can slow searches by an order of magnitude or more.
collect
in class Collector
IOException
public void setNextReader(AtomicReaderContext context) throws IOException
Collector
AtomicReaderContext
. All doc ids in
Collector.collect(int)
will correspond to IndexReaderContext.reader()
.
Add AtomicReaderContext.docBase
to the current IndexReaderContext.reader()
's
internal document id to re-base ids in Collector.collect(int)
.setNextReader
in class Collector
context
- next atomic reader contextIOException
public boolean acceptsDocsOutOfOrder()
Collector
true
if this collector does not
require the matching docIDs to be delivered in int sort
order (smallest to largest) to Collector.collect(int)
.
Most Lucene Query implementations will visit
matching docIDs in order. However, some queries
(currently limited to certain cases of BooleanQuery
) can achieve faster searching if the
Collector
allows them to deliver the
docIDs out of order.
Many collectors don't mind getting docIDs out of
order, so it's important to return true
here.
acceptsDocsOutOfOrder
in class Collector
Copyright © 2010 - 2020 Adobe. All Rights Reserved