Class BucketedSort
- All Implemented Interfaces:
Closeable
,AutoCloseable
,org.elasticsearch.core.Releasable
- Direct Known Subclasses:
BucketedSort.ForDoubles
,BucketedSort.ForFloats
,BucketedSort.ForLongs
- They can have many, many buckets so this implementation backs to
BigArrays
so it doesn't need to allocate any objects per bucket and the circuit breaker in BigArrays will automatically track memory usage and abort execution if it grows too large. - Its fairly common for a bucket to be collected but not returned so these implementations delay as much work as possible until collection
Every bucket is in one of two states: "gathering" or min/max "heap". While
"gathering" the next empty slot is stored in the "root" offset of the
bucket and collecting a value is just adding it in the next slot bumping
the tracking value at the root. So collecting values is O(1)
.
Extracting the results in sorted order is O(n * log n)
because,
well, sorting is O(n * log n)
. When a bucket has collected
bucketSize
entries it is converted into a min "heap" in
O(n)
time. Or into max heap, if order
is ascending.
Once a "heap", collecting a document is the heap-standard O(log n)
worst case. Critically, it is a very fast O(1)
to check if a value
is competitive at all which, so long as buckets aren't hit in reverse
order, they mostly won't be. Extracting results in sorted order is still
O(n * log n)
.
When we first collect a bucket we make sure that we've allocated enough slots to hold all sort values for the entire bucket. In other words: the storage is "dense" and we don't try to save space when storing partially filled buckets.
We actually *oversize* the allocations
(like BigArrays.overSize(long)
) to get amortized linear number
of allocations and to play well with our paged arrays.
-
Nested Class Summary
Modifier and TypeClassDescriptionstatic interface
Callbacks for storing extra data along with competitive sorts.static class
Superclass for implementations of BucketedSort fordouble
keys.static class
Superclass for implementations of BucketedSort forfloat
keys.static class
Superclass for implementations of BucketedSort forlong
keys.class
Performs the actual collection against a LeafReaderContext.static interface
Used withgetValues(long, ResultBuilder)
to build results from the sorting operation. -
Field Summary
Modifier and TypeFieldDescriptionprotected final BigArrays
protected final BucketedSort.ExtraData
static final BucketedSort.ExtraData
An implementation of BucketedSort.ExtraData that does nothing. -
Constructor Summary
ModifierConstructorDescriptionprotected
BucketedSort
(BigArrays bigArrays, SortOrder order, DocValueFormat format, int bucketSize, BucketedSort.ExtraData extra) -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract boolean
betterThan
(long lhs, long rhs) true
if the entry at indexlhs
is "better" than the entry atrhs
.final void
close()
protected final String
Return a fairly human readable representation of the array backing the sort.abstract BucketedSort.Leaf
forLeaf
(org.apache.lucene.index.LeafReaderContext ctx) Get the BucketedSort.Leaf implementation that'll do that actual collecting.int
The number of values to store per bucket.final DocValueFormat
The format to use when presenting the values.protected abstract int
getNextGatherOffset
(long rootIndex) Get the next index that should be "gathered" for a bucket rooted atrootIndex
.final SortOrder
getOrder()
The order of the sort.protected abstract SortValue
getValue
(long index) Get the value at an index.getValues
(long bucket) Get the values for a bucket if it has been collected.final <T extends Comparable<T>>
List<T>getValues
(long bucket, BucketedSort.ResultBuilder<T> builder) Get the values for a bucket if it has been collected.protected abstract void
growValues
(long minSize) Grow the BigArray backing this sort to account for new buckets.boolean
inHeapMode
(long bucket) Is this bucket a min heaptrue
or in gathering modefalse
?protected final void
Initialize the gather offsets after setting up values.abstract boolean
Does this sort need scores? Most don't, but sorting on_score
does.protected abstract void
setNextGatherOffset
(long rootIndex, int offset) Set the next index that should be "gathered" for a bucket rooted atrootIndex
.protected abstract void
swap
(long lhs, long rhs) Swap the data at two indices.protected abstract BigArray
values()
The BigArray backing this sort.
-
Field Details
-
NOOP_EXTRA_DATA
An implementation of BucketedSort.ExtraData that does nothing. -
bigArrays
-
extra
-
-
Constructor Details
-
BucketedSort
protected BucketedSort(BigArrays bigArrays, SortOrder order, DocValueFormat format, int bucketSize, BucketedSort.ExtraData extra)
-
-
Method Details
-
getOrder
The order of the sort. -
getFormat
The format to use when presenting the values. -
getBucketSize
public int getBucketSize()The number of values to store per bucket. -
getValues
public final <T extends Comparable<T>> List<T> getValues(long bucket, BucketedSort.ResultBuilder<T> builder) throws IOException Get the values for a bucket if it has been collected. If it hasn't then returns an empty list.- Parameters:
builder
- builds results. SeeBucketedSort.ExtraData
for how to store data along side the sort for this to extract.- Throws:
IOException
-
getValues
Get the values for a bucket if it has been collected. If it hasn't then returns an empty array.- Throws:
IOException
-
inHeapMode
public boolean inHeapMode(long bucket) Is this bucket a min heaptrue
or in gathering modefalse
? -
forLeaf
public abstract BucketedSort.Leaf forLeaf(org.apache.lucene.index.LeafReaderContext ctx) throws IOException Get the BucketedSort.Leaf implementation that'll do that actual collecting.- Throws:
IOException
- most implementations need to perform IO to prepare for each leaf
-
needsScores
public abstract boolean needsScores()Does this sort need scores? Most don't, but sorting on_score
does. -
values
The BigArray backing this sort. -
growValues
protected abstract void growValues(long minSize) Grow the BigArray backing this sort to account for new buckets. This will only be called if the array is too small. -
getNextGatherOffset
protected abstract int getNextGatherOffset(long rootIndex) Get the next index that should be "gathered" for a bucket rooted atrootIndex
. -
setNextGatherOffset
protected abstract void setNextGatherOffset(long rootIndex, int offset) Set the next index that should be "gathered" for a bucket rooted atrootIndex
. -
getValue
Get the value at an index. -
betterThan
protected abstract boolean betterThan(long lhs, long rhs) true
if the entry at indexlhs
is "better" than the entry atrhs
. "Better" in this means "lower" forSortOrder.ASC
and "higher" forSortOrder.DESC
. -
swap
protected abstract void swap(long lhs, long rhs) Swap the data at two indices. -
debugFormat
Return a fairly human readable representation of the array backing the sort.This is intentionally not a
Object.toString()
implementation because it'll be quite slow. -
initGatherOffsets
protected final void initGatherOffsets()Initialize the gather offsets after setting up values. Subclasses should call this once, after setting up theirvalues()
. -
close
public final void close()- Specified by:
close
in interfaceAutoCloseable
- Specified by:
close
in interfaceCloseable
- Specified by:
close
in interfaceorg.elasticsearch.core.Releasable
-