AggregatorFactory (druid-processing 27.0.0 API)

java.lang.Object
- org.apache.druid.query.aggregation.AggregatorFactory

All Implemented Interfaces:

Cacheable

Direct Known Subclasses:

CardinalityAggregatorFactory, CountAggregatorFactory, DoubleAnyAggregatorFactory, DoubleFirstAggregatorFactory, DoubleLastAggregatorFactory, DoubleMeanAggregatorFactory, ExpressionLambdaAggregatorFactory, FilteredAggregatorFactory, FloatAnyAggregatorFactory, FloatFirstAggregatorFactory, FloatLastAggregatorFactory, GroupingAggregatorFactory, HistogramAggregatorFactory, HyperUniquesAggregatorFactory, JavaScriptAggregatorFactory, LongAnyAggregatorFactory, LongFirstAggregatorFactory, LongLastAggregatorFactory, NullableNumericAggregatorFactory, StringAnyAggregatorFactory, StringFirstAggregatorFactory, StringLastAggregatorFactory, SuppressedAggregatorFactory
```
public abstract class AggregatorFactory
extends Object
implements Cacheable
```
AggregatorFactory is a strategy (in the terms of Design Patterns) that represents column aggregation, e.g. min, max, sum of metric columns, or cardinality of dimension columns (see CardinalityAggregatorFactory). Implementations of AggregatorFactory which need to Support Nullable Aggregations are encouraged to extend NullableNumericAggregatorFactory. Implementations are also expected to correctly handle single/multi value string type columns as it makes sense for them e.g. doubleSum aggregator tries to parse the string value as double and assumes it to be zero if parsing fails. If it is a multi value column then each individual value should be taken into account for aggregation e.g. if a row had value ["1","1","1"], doubleSum aggregation would take each of them and sum them to 3.

Constructor Summary

Constructors
Constructor and Description

AggregatorFactory()

Constructors
Constructor and Description
`AggregatorFactory()`

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`boolean`	`canVectorize(ColumnInspector columnInspector)` Returns whether or not this aggregation class supports vectorization.
`abstract Object`	`combine(Object lhs, Object rhs)` A method that knows how to combine the outputs of `Aggregator.get()` produced via `factorize(org.apache.druid.segment.ColumnSelectorFactory)` or `BufferAggregator.get(java.nio.ByteBuffer, int)` produced via `factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory)`.
`abstract Object`	`deserialize(Object object)` A method that knows how to "deserialize" the object from whatever form it might have been put into in order to transfer via JSON.
`abstract Aggregator`	`factorize(ColumnSelectorFactory metricFactory)`
`abstract BufferAggregator`	`factorizeBuffered(ColumnSelectorFactory metricFactory)`
`VectorAggregator`	`factorizeVector(VectorColumnSelectorFactory selectorFactory)` Create a VectorAggregator based on the provided column selector factory.
`AggregatorAndSize`	`factorizeWithSize(ColumnSelectorFactory metricFactory)` Creates an `Aggregator` based on the provided column selector factory.
`abstract Object`	`finalizeComputation(Object object)` "Finalizes" the computation of an object.
`abstract AggregatorFactory`	`getCombiningFactory()` Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory.
`abstract Comparator`	`getComparator()`
`String`	`getComplexTypeName()` Deprecated.
`ValueType`	`getFinalizedType()` Deprecated.
`ColumnType`	`getIntermediateType()` Get the "intermediate" `ColumnType` for this aggregator.
`abstract int`	`getMaxIntermediateSize()` Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.
`int`	`getMaxIntermediateSizeWithNulls()` Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.
`AggregatorFactory`	`getMergingFactory(AggregatorFactory other)` Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory and another factory.
`abstract String`	`getName()`
`abstract List<AggregatorFactory>`	`getRequiredColumns()` Used by `GroupByStrategyV1` when running nested groupBys, to "transfer" values from this aggreagtor to an incremental index that the outer query will run on.
`ColumnType`	`getResultType()` Get the `ColumnType` for the final form of this aggregator, i.e.
`ValueType`	`getType()` Deprecated.
`int`	`guessAggregatorHeapFootprint(long rows)` Returns a best guess as to how much memory the on-heap `Aggregator` returned by `factorize(org.apache.druid.segment.ColumnSelectorFactory)` will require when a certain number of rows have been aggregated into it.
`AggregateCombiner`	`makeAggregateCombiner()` Creates an AggregateCombiner to fold rollup aggregation results from serveral "rows" of different indexes during index merging.
`AggregateCombiner`	`makeNullableAggregateCombiner()` Creates an `AggregateCombiner` which supports nullability.
`static AggregatorFactory[]`	`mergeAggregators(List<AggregatorFactory[]> aggregatorsList)` Merges the list of AggregatorFactory[] (presumable from metadata of some segments being merged) and returns merged AggregatorFactory[] (for the metadata for merged segment).
`AggregatorFactory`	`optimizeForSegment(PerSegmentQueryOptimizationContext optimizationContext)` Return a potentially optimized form of this AggregatorFactory for per-segment queries.
`abstract List<String>`	`requiredFields()` Get a list of fields that aggregators built by this factory will need to read.
`AggregatorFactory`	`withName(String newName)` Used in cases where we want to change the output name of the aggregator to something else.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.druid.java.util.common.Cacheable
getCacheKey

- Constructor Detail
  - AggregatorFactory
```
public AggregatorFactory()
```
- Method Detail
  - factorize
```
public abstract Aggregator factorize(ColumnSelectorFactory metricFactory)
```
  - factorizeBuffered
```
public abstract BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
```
  - factorizeVector
```
public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
```
    Create a VectorAggregator based on the provided column selector factory. Will throw an exception if this aggregation class does not support vectorization: check "canVectorize" first.
  - factorizeWithSize
```
public AggregatorAndSize factorizeWithSize(ColumnSelectorFactory metricFactory)
```
    Creates an Aggregator based on the provided column selector factory. The returned value is a holder object which contains both the aggregator and its initial size in bytes. The callers can then invoke Aggregator.aggregateWithSize() to perform aggregation and get back the incremental memory required in each aggregate call. Combined with the initial size, this gives the total on-heap memory required by the aggregator.
    This method must include JVM object overheads in the estimated size and must ensure not to underestimate required memory as that might lead to OOM errors.
    This flow does not require invoking guessAggregatorHeapFootprint(long) which tends to over-estimate the required memory.
    
    Returns:
    
    AggregatorAndSize which contains the actual aggregator and its initial size.
  - canVectorize
```
public boolean canVectorize(ColumnInspector columnInspector)
```
    Returns whether or not this aggregation class supports vectorization. The default implementation returns false.
  - getComparator
```
public abstract Comparator getComparator()
```
  - combine
```
@Nullable
public abstract Object combine(@Nullable
                                         Object lhs,
                                         @Nullable
                                         Object rhs)
```
    A method that knows how to combine the outputs of Aggregator.get() produced via factorize(org.apache.druid.segment.ColumnSelectorFactory) or BufferAggregator.get(java.nio.ByteBuffer, int) produced via factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory). Note, even though this method is called "combine", this method's contract *does* allow for mutation of the input objects. Thus, any use of lhs or rhs after calling this method is highly discouraged.
    
    Parameters:
    
    lhs - The left hand side of the combine
    
    rhs - The right hand side of the combine
    
    Returns:
    
    an object representing the combination of lhs and rhs, this can be a new object or a mutation of the inputs
  - makeAggregateCombiner
```
public AggregateCombiner makeAggregateCombiner()
```
    Creates an AggregateCombiner to fold rollup aggregation results from serveral "rows" of different indexes during index merging. AggregateCombiner implements the same logic as combine(java.lang.Object, java.lang.Object), with the difference that it uses ColumnValueSelector and it's subinterfaces to get inputs and implements ColumnValueSelector to provide output.
    
    See Also:
    
    AggregateCombiner, IndexMerger
  - makeNullableAggregateCombiner
```
public AggregateCombiner makeNullableAggregateCombiner()
```
    Creates an AggregateCombiner which supports nullability. Implementations of AggregatorFactory which need to Support Nullable Aggregations are encouraged to extend NullableNumericAggregatorFactory instead of overriding this method. Default implementation calls makeAggregateCombiner() for backwards compatibility.
    
    See Also:
    
    AggregateCombiner, NullableNumericAggregatorFactory
  - getCombiningFactory
```
public abstract AggregatorFactory getCombiningFactory()
```
    Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory. It is used when we know we have some values that were produced with this aggregator factory, and want to do some additional combining of them. This happens, for example, when merging query results from two different segments, or two different servers. For simple aggregators, the combining factory may be computed by simply creating a new factory that is the same as the current, except with its input column renamed to the same as the output column. For example, this aggregator: {"type": "longSum", "fieldName": "foo", "name": "bar"} Would become: {"type": "longSum", "fieldName": "bar", "name": "bar"} Sometimes, the type or other parameters of the combining aggregator will be different from the original aggregator. For example, the CountAggregatorFactory getCombiningFactory method will return a LongSumAggregatorFactory, because counts are combined by summing. No matter what, `foo.getCombiningFactory()` and `foo.getCombiningFactory().getCombiningFactory()` should return the same result.
    
    Returns:
    
    a new Factory that can be used for operations on top of data output from the current factory.
  - getMergingFactory
```
public AggregatorFactory getMergingFactory(AggregatorFactory other)
                                    throws AggregatorFactoryNotMergeableException
```
    Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory and another factory. It is used when we have some values produced by this aggregator factory, and some values produced by the "other" aggregator factory, and we want to do some additional combining of them. This happens, for example, when compacting two segments together that both have a metric column with the same name. (Even though the name of the column is the same, the aggregator factory used to create it may be different from segment to segment.) This method may throw AggregatorFactoryNotMergeableException, meaning that "this" and "other" are not compatible and values from one cannot sensibly be combined with values from the other.
    
    Returns:
    
    a new Factory that can be used for merging the output of aggregators from this factory and other.
    
    Throws:
    
    AggregatorFactoryNotMergeableException
    
    See Also:
    
    which is equivalent to {@code foo.getMergingFactory(foo)} (when "this" and "other" are the same instance).
  - getRequiredColumns
```
public abstract List<AggregatorFactory> getRequiredColumns()
```
    Used by GroupByStrategyV1 when running nested groupBys, to "transfer" values from this aggreagtor to an incremental index that the outer query will run on. This method only exists due to the design of GroupByStrategyV1, and should probably not be used for anything else. If you are here because you are looking for a way to get the input fields required by this aggregator, and thought "getRequiredColumns" sounded right, please use requiredFields() instead.
    
    Returns:
    
    AggregatorFactories that can be used to "transfer" values from this aggregator into an incremental index
    
    See Also:
    
    a similarly-named method that is perhaps the one you want instead.
  - deserialize
```
public abstract Object deserialize(Object object)
```
    A method that knows how to "deserialize" the object from whatever form it might have been put into in order to transfer via JSON.
    
    Parameters:
    
    object - the object to deserialize
    
    Returns:
    
    the deserialized object
  - finalizeComputation
```
@Nullable
public abstract Object finalizeComputation(@Nullable
                                                     Object object)
```
    "Finalizes" the computation of an object. Primarily useful for complex types that have a different mergeable intermediate format than their final resultant output.
    
    Parameters:
    
    object - the object to be finalized
    
    Returns:
    
    the finalized value that should be returned for the initial query
  - getName
```
public abstract String getName()
```
    Returns:
    
    output name of the aggregator column.
  - requiredFields
```
public abstract List<String> requiredFields()
```
    Get a list of fields that aggregators built by this factory will need to read.
  - getIntermediateType
```
public ColumnType getIntermediateType()
```
    Get the "intermediate" ColumnType for this aggregator. This is the same as the type returned by deserialize(java.lang.Object) and the type accepted by combine(java.lang.Object, java.lang.Object). However, it is *not* necessarily the same type returned by finalizeComputation(java.lang.Object). Refer to the ColumnType javadocs for details on the implications of choosing a type.
  - getResultType
```
public ColumnType getResultType()
```
    Get the ColumnType for the final form of this aggregator, i.e. the type of the value returned by finalizeComputation(java.lang.Object). This may be the same as or different than the types expected in deserialize(java.lang.Object) and combine(java.lang.Object, java.lang.Object). Refer to the ColumnType javadocs for details on the implications of choosing a type.
  - getType
```
@Deprecated
public ValueType getType()
```
    Deprecated.
    
    This method is deprecated and will be removed soon. Use getIntermediateType() instead. Do not call this method, it will likely produce incorrect results, it exists for backwards compatibility.
  - getFinalizedType
```
@Deprecated
public ValueType getFinalizedType()
```
    Deprecated.
    
    This method is deprecated and will be removed soon. Use getResultType() instead. Do not call this method, it will likely produce incorrect results, it exists for backwards compatibility.
  - getComplexTypeName
```
@Nullable
 @Deprecated
public String getComplexTypeName()
```
    Deprecated.
    
    This method is deprecated and will be removed soon. Use getIntermediateType() instead. Do not call this method, it will likely produce incorrect results, it exists for backwards compatibility.
  - getMaxIntermediateSize
```
public abstract int getMaxIntermediateSize()
```
    Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.
    
    Returns:
    
    the maximum number of bytes that an aggregator of this type will require for intermediate result storage.
  - getMaxIntermediateSizeWithNulls
```
public int getMaxIntermediateSizeWithNulls()
```
    Returns the maximum size that this aggregator will require in bytes for intermediate storage of results. Implementations of AggregatorFactory which need to Support Nullable Aggregations are encouraged to extend NullableNumericAggregatorFactory instead of overriding this method. Default implementation calls makeAggregateCombiner() for backwards compatibility.
    
    Returns:
    
    the maximum number of bytes that an aggregator of this type will require for intermediate result storage.
  - guessAggregatorHeapFootprint
```
public int guessAggregatorHeapFootprint(long rows)
```
    Returns a best guess as to how much memory the on-heap Aggregator returned by factorize(org.apache.druid.segment.ColumnSelectorFactory) will require when a certain number of rows have been aggregated into it. The main user of this method is OnheapIncrementalIndex, which uses it to determine when to persist the current in-memory data to disk. Important note for callers! In nearly all cases, callers that wish to constrain memory would be better off using factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory) or factorizeVector(org.apache.druid.segment.vector.VectorColumnSelectorFactory), which offer precise control over how much memory is being used.
  - optimizeForSegment
```
public AggregatorFactory optimizeForSegment(PerSegmentQueryOptimizationContext optimizationContext)
```
    Return a potentially optimized form of this AggregatorFactory for per-segment queries.
  - withName
```
public AggregatorFactory withName(String newName)
```
    Used in cases where we want to change the output name of the aggregator to something else. For eg: if we have a query `select a, sum(b) as total group by a from table` the aggregator returned from the native group by query is "a0" set in org.apache.druid.sql.calcite.rel.DruidQuery#computeAggregations. We can use withName("total") to set the output name of the aggregator to "total".
    As all implementations of this interface method may not exist, callers of this method are advised to handle such a case.
    
    Parameters:
    
    newName - newName of the output for aggregator factory
    
    Returns:
    
    AggregatorFactory with the output name set as the input param.
  - mergeAggregators
```
@Nullable
public static AggregatorFactory[] mergeAggregators(List<AggregatorFactory[]> aggregatorsList)
```
    Merges the list of AggregatorFactory[] (presumable from metadata of some segments being merged) and returns merged AggregatorFactory[] (for the metadata for merged segment). Null is returned if it is not possible to do the merging for any of the following reason. - one of the element in input list is null i.e. aggregators for one the segments being merged is unknown - AggregatorFactory of same name can not be merged if they are not compatible
    
    Parameters:
    
    aggregatorsList -
    
    Returns:
    
    merged AggregatorFactory[] or Null if merging is not possible.

Class AggregatorFactory

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.druid.java.util.common.Cacheable

Constructor Detail

AggregatorFactory

Method Detail

factorize

factorizeBuffered

factorizeVector

factorizeWithSize

canVectorize

getComparator

combine

makeAggregateCombiner

makeNullableAggregateCombiner

getCombiningFactory

getMergingFactory

getRequiredColumns

deserialize

finalizeComputation

getName

requiredFields

getIntermediateType

getResultType

getType

getFinalizedType

getComplexTypeName

getMaxIntermediateSize

getMaxIntermediateSizeWithNulls

guessAggregatorHeapFootprint

optimizeForSegment

withName

mergeAggregators