org.elasticsearch.search.sort.BucketedSort

All Implemented Interfaces:: Closeable, AutoCloseable, org.elasticsearch.core.Releasable

Direct Known Subclasses:: BucketedSort.ForDoubles, BucketedSort.ForFloats, BucketedSort.ForLongs

public abstract class BucketedSort extends Object implements org.elasticsearch.core.Releasable

Type specialized sort implementations designed for use in aggregations. Aggregations have a couple of super interesting characteristics:

They can have many, many buckets so this implementation backs to BigArrays so it doesn't need to allocate any objects per bucket and the circuit breaker in BigArrays will automatically track memory usage and abort execution if it grows too large.
Its fairly common for a bucket to be collected but not returned so these implementations delay as much work as possible until collection

Every bucket is in one of two states: "gathering" or min/max "heap". While "gathering" the next empty slot is stored in the "root" offset of the bucket and collecting a value is just adding it in the next slot bumping the tracking value at the root. So collecting values is O(1). Extracting the results in sorted order is O(n * log n) because, well, sorting is O(n * log n). When a bucket has collected bucketSize entries it is converted into a min "heap" in O(n) time. Or into max heap, if order is ascending.

Once a "heap", collecting a document is the heap-standard O(log n) worst case. Critically, it is a very fast O(1) to check if a value is competitive at all which, so long as buckets aren't hit in reverse order, they mostly won't be. Extracting results in sorted order is still O(n * log n).

When we first collect a bucket we make sure that we've allocated enough slots to hold all sort values for the entire bucket. In other words: the storage is "dense" and we don't try to save space when storing partially filled buckets.

We actually *oversize* the allocations (like BigArrays.overSize(long)) to get amortized linear number of allocations and to play well with our paged arrays.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static interface

BucketedSort.ExtraData

Callbacks for storing extra data along with competitive sorts.

static class

BucketedSort.ForDoubles

Superclass for implementations of BucketedSort for double keys.

static class

BucketedSort.ForFloats

Superclass for implementations of BucketedSort for float keys.

static class

BucketedSort.ForLongs

Superclass for implementations of BucketedSort for long keys.

class

BucketedSort.Leaf

Performs the actual collection against a LeafReaderContext.

static interface

BucketedSort.ResultBuilder<T>

Used with getValues(long, ResultBuilder) to build results from the sorting operation.
Field Summary

Fields

Modifier and Type

Field

Description

protected final BigArrays

bigArrays

protected final BucketedSort.ExtraData

extra

static final BucketedSort.ExtraData

NOOP_EXTRA_DATA

An implementation of BucketedSort.ExtraData that does nothing.
Constructor Summary

Constructors

Modifier

Constructor

Description

protected

BucketedSort(BigArrays bigArrays, SortOrder order, DocValueFormat format, int bucketSize, BucketedSort.ExtraData extra)
Method Summary

Modifier and Type

Method

Description

protected abstract boolean

betterThan(long lhs, long rhs)

true if the entry at index lhs is "better" than the entry at rhs.

final void

close()

protected final String

debugFormat()

Return a fairly human readable representation of the array backing the sort.

abstract BucketedSort.Leaf

forLeaf(org.apache.lucene.index.LeafReaderContext ctx)

Get the BucketedSort.Leaf implementation that'll do that actual collecting.

int

getBucketSize()

The number of values to store per bucket.

final DocValueFormat

getFormat()

The format to use when presenting the values.

protected abstract int

getNextGatherOffset(long rootIndex)

Get the next index that should be "gathered" for a bucket rooted at rootIndex.

final SortOrder

getOrder()

The order of the sort.

protected abstract SortValue

getValue(long index)

Get the value at an index.

final List<SortValue>

getValues(long bucket)

Get the values for a bucket if it has been collected.

final <T extends Comparable<T>> List<T>

getValues(long bucket, BucketedSort.ResultBuilder<T> builder)

Get the values for a bucket if it has been collected.

protected abstract void

growValues(long minSize)

Grow the BigArray backing this sort to account for new buckets.

boolean

inHeapMode(long bucket)

Is this bucket a min heap true or in gathering mode false?

protected final void

initGatherOffsets()

Initialize the gather offsets after setting up values.

abstract boolean

needsScores()

Does this sort need scores? Most don't, but sorting on _score does.

protected abstract void

setNextGatherOffset(long rootIndex, int offset)

Set the next index that should be "gathered" for a bucket rooted at rootIndex.

protected abstract void

swap(long lhs, long rhs)

Swap the data at two indices.

protected abstract BigArray

values()

The BigArray backing this sort.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- NOOP_EXTRA_DATA
  
  public static final BucketedSort.ExtraData NOOP_EXTRA_DATA
  
  An implementation of BucketedSort.ExtraData that does nothing.
- bigArrays
  
  protected final BigArrays bigArrays
- extra
  
  protected final BucketedSort.ExtraData extra
Constructor Details
- BucketedSort
  
  protected BucketedSort(BigArrays bigArrays, SortOrder order, DocValueFormat format, int bucketSize, BucketedSort.ExtraData extra)
Method Details
- getOrder
  
  public final SortOrder getOrder()
  
  The order of the sort.
- getFormat
  
  public final DocValueFormat getFormat()
  
  The format to use when presenting the values.
- getBucketSize
  
  public int getBucketSize()
  
  The number of values to store per bucket.
- getValues
  
  public final <T extends Comparable<T>> List<T> getValues(long bucket, BucketedSort.ResultBuilder<T> builder) throws IOException
  
  Get the values for a bucket if it has been collected. If it hasn't then returns an empty list.
  
  Parameters:
  
  builder - builds results. See BucketedSort.ExtraData for how to store data along side the sort for this to extract.
  
  Throws:
  
  IOException
- getValues
  
  public final List<SortValue> getValues(long bucket) throws IOException
  
  Get the values for a bucket if it has been collected. If it hasn't then returns an empty array.
  
  Throws:
  
  IOException
- inHeapMode
  
  public boolean inHeapMode(long bucket)
  
  Is this bucket a min heap true or in gathering mode false?
- forLeaf
  
  public abstract BucketedSort.Leaf forLeaf(org.apache.lucene.index.LeafReaderContext ctx) throws IOException
  
  Get the BucketedSort.Leaf implementation that'll do that actual collecting.
  
  Throws:
  
  IOException - most implementations need to perform IO to prepare for each leaf
- needsScores
  
  public abstract boolean needsScores()
  
  Does this sort need scores? Most don't, but sorting on _score does.
- values
  
  protected abstract BigArray values()
  
  The BigArray backing this sort.
- growValues
  
  protected abstract void growValues(long minSize)
  
  Grow the BigArray backing this sort to account for new buckets. This will only be called if the array is too small.
- getNextGatherOffset
  
  protected abstract int getNextGatherOffset(long rootIndex)
  
  Get the next index that should be "gathered" for a bucket rooted at rootIndex.
- setNextGatherOffset
  
  protected abstract void setNextGatherOffset(long rootIndex, int offset)
  
  Set the next index that should be "gathered" for a bucket rooted at rootIndex.
- getValue
  
  protected abstract SortValue getValue(long index)
  
  Get the value at an index.
- betterThan
  
  protected abstract boolean betterThan(long lhs, long rhs)
  
  true if the entry at index lhs is "better" than the entry at rhs. "Better" in this means "lower" for SortOrder.ASC and "higher" for SortOrder.DESC.
- swap
  
  protected abstract void swap(long lhs, long rhs)
  
  Swap the data at two indices.
- debugFormat
  
  protected final String debugFormat()
  
  Return a fairly human readable representation of the array backing the sort.
  This is intentionally not a Object.toString() implementation because it'll be quite slow.
- initGatherOffsets
  
  protected final void initGatherOffsets()
  
  Initialize the gather offsets after setting up values. Subclasses should call this once, after setting up their values().
- close
  
  public final void close()
  
  Specified by:
  
  close in interface AutoCloseable
  
  Specified by:
  
  close in interface Closeable
  
  Specified by:
  
  close in interface org.elasticsearch.core.Releasable

Class BucketedSort

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

NOOP_EXTRA_DATA

bigArrays

extra

Constructor Details

BucketedSort

Method Details

getOrder

getFormat

getBucketSize

getValues

getValues

inHeapMode

forLeaf

needsScores

values

growValues

getNextGatherOffset

setNextGatherOffset

getValue

betterThan

swap

debugFormat

initGatherOffsets

close