Class AggregatorFactory
- java.lang.Object
-
- org.apache.druid.query.aggregation.AggregatorFactory
-
- All Implemented Interfaces:
Cacheable
- Direct Known Subclasses:
CardinalityAggregatorFactory
,CountAggregatorFactory
,DoubleAnyAggregatorFactory
,DoubleFirstAggregatorFactory
,DoubleLastAggregatorFactory
,DoubleMeanAggregatorFactory
,ExpressionLambdaAggregatorFactory
,FilteredAggregatorFactory
,FloatAnyAggregatorFactory
,FloatFirstAggregatorFactory
,FloatLastAggregatorFactory
,GroupingAggregatorFactory
,HistogramAggregatorFactory
,HyperUniquesAggregatorFactory
,JavaScriptAggregatorFactory
,LongAnyAggregatorFactory
,LongFirstAggregatorFactory
,LongLastAggregatorFactory
,NullableNumericAggregatorFactory
,SingleValueAggregatorFactory
,StringAnyAggregatorFactory
,StringFirstAggregatorFactory
,StringLastAggregatorFactory
,SuppressedAggregatorFactory
public abstract class AggregatorFactory extends Object implements Cacheable
AggregatorFactory is a strategy (in the terms of Design Patterns) that represents column aggregation, e.g. min, max, sum of metric columns, or cardinality of dimension columns (seeCardinalityAggregatorFactory
). Implementations ofAggregatorFactory
which need to Support Nullable Aggregations are encouraged to extendNullableNumericAggregatorFactory
. Implementations are also expected to correctly handle single/multi value string type columns as it makes sense for them e.g. doubleSum aggregator tries to parse the string value as double and assumes it to be zero if parsing fails. If it is a multi value column then each individual value should be taken into account for aggregation e.g. if a row had value ["1","1","1"], doubleSum aggregation would take each of them and sum them to 3.
-
-
Constructor Summary
Constructors Constructor Description AggregatorFactory()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description boolean
canVectorize(ColumnInspector columnInspector)
Returns whether or not this aggregation class supports vectorization.abstract Object
combine(Object lhs, Object rhs)
A method that knows how to combine the outputs ofAggregator.get()
produced viafactorize(org.apache.druid.segment.ColumnSelectorFactory)
orBufferAggregator.get(java.nio.ByteBuffer, int)
produced viafactorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory)
.abstract Object
deserialize(Object object)
A method that knows how to "deserialize" the object from whatever form it might have been put into in order to transfer via JSON.abstract Aggregator
factorize(ColumnSelectorFactory metricFactory)
abstract BufferAggregator
factorizeBuffered(ColumnSelectorFactory metricFactory)
VectorAggregator
factorizeVector(VectorColumnSelectorFactory selectorFactory)
Create a VectorAggregator based on the provided column selector factory.AggregatorAndSize
factorizeWithSize(ColumnSelectorFactory metricFactory)
Creates anAggregator
based on the provided column selector factory.abstract Object
finalizeComputation(Object object)
"Finalizes" the computation of an object.abstract AggregatorFactory
getCombiningFactory()
Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory.abstract Comparator
getComparator()
String
getComplexTypeName()
Deprecated.ValueType
getFinalizedType()
Deprecated.ColumnType
getIntermediateType()
Get the "intermediate"ColumnType
for this aggregator.abstract int
getMaxIntermediateSize()
Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.int
getMaxIntermediateSizeWithNulls()
Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.AggregatorFactory
getMergingFactory(AggregatorFactory other)
Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory and another factory.abstract String
getName()
List<AggregatorFactory>
getRequiredColumns()
Deprecated.ColumnType
getResultType()
Get theColumnType
for the final form of this aggregator, i.e.ValueType
getType()
Deprecated.int
guessAggregatorHeapFootprint(long rows)
Returns a best guess as to how much memory the on-heapAggregator
returned byfactorize(org.apache.druid.segment.ColumnSelectorFactory)
will require when a certain number of rows have been aggregated into it.AggregateCombiner
makeAggregateCombiner()
Creates an AggregateCombiner to fold rollup aggregation results from serveral "rows" of different indexes during index merging.AggregateCombiner
makeNullableAggregateCombiner()
Creates anAggregateCombiner
which supports nullability.static AggregatorFactory[]
mergeAggregators(List<AggregatorFactory[]> aggregatorsList)
Merges the list of AggregatorFactory[] (presumable from metadata of some segments being merged) and returns merged AggregatorFactory[] (for the metadata for merged segment).AggregatorFactory
optimizeForSegment(PerSegmentQueryOptimizationContext optimizationContext)
Return a potentially optimized form of this AggregatorFactory for per-segment queries.abstract List<String>
requiredFields()
Get a list of fields that aggregators built by this factory will need to read.AggregatorFactory
withName(String newName)
Used in cases where we want to change the output name of the aggregator to something else.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.druid.java.util.common.Cacheable
getCacheKey
-
-
-
-
Method Detail
-
factorize
public abstract Aggregator factorize(ColumnSelectorFactory metricFactory)
-
factorizeBuffered
public abstract BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
-
factorizeVector
public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
Create a VectorAggregator based on the provided column selector factory. Will throw an exception if this aggregation class does not support vectorization: check "canVectorize" first.
-
factorizeWithSize
public AggregatorAndSize factorizeWithSize(ColumnSelectorFactory metricFactory)
Creates anAggregator
based on the provided column selector factory. The returned value is a holder object which contains both the aggregator and its initial size in bytes. The callers can then invokeAggregator.aggregateWithSize()
to perform aggregation and get back the incremental memory required in each aggregate call. Combined with the initial size, this gives the total on-heap memory required by the aggregator.This method must include JVM object overheads in the estimated size and must ensure not to underestimate required memory as that might lead to OOM errors.
This flow does not require invoking
guessAggregatorHeapFootprint(long)
which tends to over-estimate the required memory.- Returns:
- AggregatorAndSize which contains the actual aggregator and its initial size.
-
canVectorize
public boolean canVectorize(ColumnInspector columnInspector)
Returns whether or not this aggregation class supports vectorization. The default implementation returns false.
-
getComparator
public abstract Comparator getComparator()
-
combine
@Nullable public abstract Object combine(@Nullable Object lhs, @Nullable Object rhs)
A method that knows how to combine the outputs ofAggregator.get()
produced viafactorize(org.apache.druid.segment.ColumnSelectorFactory)
orBufferAggregator.get(java.nio.ByteBuffer, int)
produced viafactorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory)
. Note, even though this method is called "combine", this method's contract *does* allow for mutation of the input objects. Thus, any use of lhs or rhs after calling this method is highly discouraged.- Parameters:
lhs
- The left hand side of the combinerhs
- The right hand side of the combine- Returns:
- an object representing the combination of lhs and rhs, this can be a new object or a mutation of the inputs
-
makeAggregateCombiner
public AggregateCombiner makeAggregateCombiner()
Creates an AggregateCombiner to fold rollup aggregation results from serveral "rows" of different indexes during index merging. AggregateCombiner implements the same logic ascombine(java.lang.Object, java.lang.Object)
, with the difference that it usesColumnValueSelector
and it's subinterfaces to get inputs and implementsColumnValueSelector
to provide output.- See Also:
AggregateCombiner
,IndexMerger
-
makeNullableAggregateCombiner
public AggregateCombiner makeNullableAggregateCombiner()
Creates anAggregateCombiner
which supports nullability. Implementations ofAggregatorFactory
which need to Support Nullable Aggregations are encouraged to extendNullableNumericAggregatorFactory
instead of overriding this method. Default implementation callsmakeAggregateCombiner()
for backwards compatibility.
-
getCombiningFactory
public abstract AggregatorFactory getCombiningFactory()
Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory. It is used when we know we have some values that were produced with this aggregator factory, and want to do some additional combining of them. This happens, for example, when merging query results from two different segments, or two different servers. For simple aggregators, the combining factory may be computed by simply creating a new factory that is the same as the current, except with its input column renamed to the same as the output column. For example, this aggregator: {"type": "longSum", "fieldName": "foo", "name": "bar"} Would become: {"type": "longSum", "fieldName": "bar", "name": "bar"} Sometimes, the type or other parameters of the combining aggregator will be different from the original aggregator. For example, theCountAggregatorFactory
getCombiningFactory method will return aLongSumAggregatorFactory
, because counts are combined by summing. No matter what, `foo.getCombiningFactory()` and `foo.getCombiningFactory().getCombiningFactory()` should return the same result.- Returns:
- a new Factory that can be used for operations on top of data output from the current factory.
-
getMergingFactory
public AggregatorFactory getMergingFactory(AggregatorFactory other) throws AggregatorFactoryNotMergeableException
Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory and another factory. It is used when we have some values produced by this aggregator factory, and some values produced by the "other" aggregator factory, and we want to do some additional combining of them. This happens, for example, when compacting two segments together that both have a metric column with the same name. (Even though the name of the column is the same, the aggregator factory used to create it may be different from segment to segment.) This method may throwAggregatorFactoryNotMergeableException
, meaning that "this" and "other" are not compatible and values from one cannot sensibly be combined with values from the other.- Returns:
- a new Factory that can be used for merging the output of aggregators from this factory and other.
- Throws:
AggregatorFactoryNotMergeableException
- See Also:
which is equivalent to (when "this" and "other" are the same instance).
-
getRequiredColumns
@Deprecated public List<AggregatorFactory> getRequiredColumns()
Deprecated.This was previously used by group-by v1 and will be removed in a future release
-
deserialize
public abstract Object deserialize(Object object)
A method that knows how to "deserialize" the object from whatever form it might have been put into in order to transfer via JSON.- Parameters:
object
- the object to deserialize- Returns:
- the deserialized object
-
finalizeComputation
@Nullable public abstract Object finalizeComputation(@Nullable Object object)
"Finalizes" the computation of an object. Primarily useful for complex types that have a different mergeable intermediate format than their final resultant output.- Parameters:
object
- the object to be finalized- Returns:
- the finalized value that should be returned for the initial query
-
getName
public abstract String getName()
- Returns:
- output name of the aggregator column.
-
requiredFields
public abstract List<String> requiredFields()
Get a list of fields that aggregators built by this factory will need to read.
-
getIntermediateType
public ColumnType getIntermediateType()
Get the "intermediate"ColumnType
for this aggregator. This is the same as the type returned bydeserialize(java.lang.Object)
and the type accepted bycombine(java.lang.Object, java.lang.Object)
. However, it is *not* necessarily the same type returned byfinalizeComputation(java.lang.Object)
. Refer to theColumnType
javadocs for details on the implications of choosing a type.
-
getResultType
public ColumnType getResultType()
Get theColumnType
for the final form of this aggregator, i.e. the type of the value returned byfinalizeComputation(java.lang.Object)
. This may be the same as or different than the types expected indeserialize(java.lang.Object)
andcombine(java.lang.Object, java.lang.Object)
. Refer to theColumnType
javadocs for details on the implications of choosing a type.
-
getType
@Deprecated public ValueType getType()
Deprecated.This method is deprecated and will be removed soon. UsegetIntermediateType()
instead. Do not call this method, it will likely produce incorrect results, it exists for backwards compatibility.
-
getFinalizedType
@Deprecated public ValueType getFinalizedType()
Deprecated.This method is deprecated and will be removed soon. UsegetResultType()
instead. Do not call this method, it will likely produce incorrect results, it exists for backwards compatibility.
-
getComplexTypeName
@Nullable @Deprecated public String getComplexTypeName()
Deprecated.This method is deprecated and will be removed soon. UsegetIntermediateType()
instead. Do not call this method, it will likely produce incorrect results, it exists for backwards compatibility.
-
getMaxIntermediateSize
public abstract int getMaxIntermediateSize()
Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.- Returns:
- the maximum number of bytes that an aggregator of this type will require for intermediate result storage.
-
getMaxIntermediateSizeWithNulls
public int getMaxIntermediateSizeWithNulls()
Returns the maximum size that this aggregator will require in bytes for intermediate storage of results. Implementations ofAggregatorFactory
which need to Support Nullable Aggregations are encouraged to extendNullableNumericAggregatorFactory
instead of overriding this method. Default implementation callsmakeAggregateCombiner()
for backwards compatibility.- Returns:
- the maximum number of bytes that an aggregator of this type will require for intermediate result storage.
-
guessAggregatorHeapFootprint
public int guessAggregatorHeapFootprint(long rows)
Returns a best guess as to how much memory the on-heapAggregator
returned byfactorize(org.apache.druid.segment.ColumnSelectorFactory)
will require when a certain number of rows have been aggregated into it. The main user of this method isOnheapIncrementalIndex
, which uses it to determine when to persist the current in-memory data to disk. Important note for callers! In nearly all cases, callers that wish to constrain memory would be better off usingfactorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory)
orfactorizeVector(org.apache.druid.segment.vector.VectorColumnSelectorFactory)
, which offer precise control over how much memory is being used.
-
optimizeForSegment
public AggregatorFactory optimizeForSegment(PerSegmentQueryOptimizationContext optimizationContext)
Return a potentially optimized form of this AggregatorFactory for per-segment queries.
-
withName
public AggregatorFactory withName(String newName)
Used in cases where we want to change the output name of the aggregator to something else. For eg: if we have a query `select a, sum(b) as total group by a from table` the aggregator returned from the native group by query is "a0" set inorg.apache.druid.sql.calcite.rel.DruidQuery#computeAggregations
. We can use withName("total") to set the output name of the aggregator to "total".As all implementations of this interface method may not exist, callers of this method are advised to handle such a case.
- Parameters:
newName
- newName of the output for aggregator factory- Returns:
- AggregatorFactory with the output name set as the input param.
-
mergeAggregators
@Nullable public static AggregatorFactory[] mergeAggregators(List<AggregatorFactory[]> aggregatorsList)
Merges the list of AggregatorFactory[] (presumable from metadata of some segments being merged) and returns merged AggregatorFactory[] (for the metadata for merged segment). Null is returned if it is not possible to do the merging for any of the following reason. - one of the element in input list is null i.e. aggregators for one the segments being merged is unknown - AggregatorFactory of same name can not be merged if they are not compatible- Parameters:
aggregatorsList
-- Returns:
- merged AggregatorFactory[] or Null if merging is not possible.
-
-