Class GroupingAggregatorFactory
- java.lang.Object
-
- org.apache.druid.query.aggregation.AggregatorFactory
-
- org.apache.druid.query.aggregation.GroupingAggregatorFactory
-
- All Implemented Interfaces:
Cacheable
@EverythingIsNonnullByDefault public class GroupingAggregatorFactory extends AggregatorFactory
This class implementsgroupingfunction to determine the grouping that a row is part of. Different result rows for a query could have different grouping columns when subtotals are used. This aggregator factory takes following arguments -name- Name of aggregators -groupings- List of dimensions that the user is interested in tracking -keyDimensions- The list of grouping dimensions being included in the result row. This list is a subset ofgroupings. This argument cannot be passed by the user. It is set by druid engine when a particular subtotal spec is being processed. Whenever druid engine processes a new subtotal spec, engine sets that subtotal spec as newkeyDimensions. When key dimensions are updated,valueis updated as well. How the value is determined is captured atgroupingId(List, Set). since grouping has to be calculated only once, it could have been implemented as a virtual function or post-aggregator etc. We modelled it as an aggregation operator so that its output can be used in a post-aggregator. Calcite too models grouping function as an aggregation operator. Since it is a non-trivial special aggregation, implementing it required changes in core druid engine to work. There were few approaches. We chose the approach that required least changes in core druid. Refer to https://github.com/apache/druid/pull/10518#discussion_r532941216 for more details. Currently, it works in following way - On data servers (no change), - this factory generatesLongConstantAggregator/LongConstantBufferAggregator/LongConstantVectorAggregatorwith keyDimensions as null - The aggregators don't actually aggregate anything and their result is not actually used. We could have removed these aggregators on data servers but that would result in a signature mismatch on broker and data nodes. That requires extra handling and is error-prone. - On brokers - Results from data node is already being re-processed for each subtotal spec. We made modifications in this path to update the grouping id for each row.
-
-
Constructor Summary
Constructors Constructor Description GroupingAggregatorFactory(String name, List<String> groupings)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleancanVectorize(ColumnInspector columnInspector)Returns whether or not this aggregation class supports vectorization.Objectcombine(Object lhs, Object rhs)A method that knows how to combine the outputs ofAggregator.get()produced viaAggregatorFactory.factorize(org.apache.druid.segment.ColumnSelectorFactory)orBufferAggregator.get(java.nio.ByteBuffer, int)produced viaAggregatorFactory.factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory).Objectdeserialize(Object object)A method that knows how to "deserialize" the object from whatever form it might have been put into in order to transfer via JSON.booleanequals(Object o)Aggregatorfactorize(ColumnSelectorFactory metricFactory)BufferAggregatorfactorizeBuffered(ColumnSelectorFactory metricFactory)VectorAggregatorfactorizeVector(VectorColumnSelectorFactory selectorFactory)Create a VectorAggregator based on the provided column selector factory.ObjectfinalizeComputation(Object object)"Finalizes" the computation of an object.byte[]getCacheKey()Get a byte array used as a cache key.AggregatorFactorygetCombiningFactory()Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory.ComparatorgetComparator()List<String>getGroupings()ColumnTypegetIntermediateType()Get the "intermediate"ColumnTypefor this aggregator.intgetMaxIntermediateSize()Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.StringgetName()ColumnTypegetResultType()Get theColumnTypefor the final form of this aggregator, i.e.longgetValue()inthashCode()List<String>requiredFields()Get a list of fields that aggregators built by this factory will need to read.StringtoString()GroupingAggregatorFactorywithKeyDimensions(Set<String> newKeyDimensions)Replace the paramkeyDimensionswith the new set of key dimensionsAggregatorFactorywithName(String newName)Used in cases where we want to change the output name of the aggregator to something else.-
Methods inherited from class org.apache.druid.query.aggregation.AggregatorFactory
factorizeWithSize, getComplexTypeName, getFinalizedType, getMaxIntermediateSizeWithNulls, getMergingFactory, getRequiredColumns, getType, guessAggregatorHeapFootprint, makeAggregateCombiner, makeNullableAggregateCombiner, mergeAggregators, optimizeForSegment, substituteCombiningFactory
-
-
-
-
Method Detail
-
factorize
public Aggregator factorize(ColumnSelectorFactory metricFactory)
- Specified by:
factorizein classAggregatorFactory
-
factorizeBuffered
public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
- Specified by:
factorizeBufferedin classAggregatorFactory
-
factorizeVector
public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
Description copied from class:AggregatorFactoryCreate a VectorAggregator based on the provided column selector factory. Will throw an exception if this aggregation class does not support vectorization: check "canVectorize" first.- Overrides:
factorizeVectorin classAggregatorFactory
-
canVectorize
public boolean canVectorize(ColumnInspector columnInspector)
Description copied from class:AggregatorFactoryReturns whether or not this aggregation class supports vectorization. The default implementation returns false.- Overrides:
canVectorizein classAggregatorFactory
-
withKeyDimensions
public GroupingAggregatorFactory withKeyDimensions(Set<String> newKeyDimensions)
Replace the paramkeyDimensionswith the new set of key dimensions
-
getComparator
public Comparator getComparator()
- Specified by:
getComparatorin classAggregatorFactory
-
getName
public String getName()
- Specified by:
getNamein classAggregatorFactory- Returns:
- output name of the aggregator column.
-
getValue
public long getValue()
-
combine
@Nullable public Object combine(@Nullable Object lhs, @Nullable Object rhs)
Description copied from class:AggregatorFactoryA method that knows how to combine the outputs ofAggregator.get()produced viaAggregatorFactory.factorize(org.apache.druid.segment.ColumnSelectorFactory)orBufferAggregator.get(java.nio.ByteBuffer, int)produced viaAggregatorFactory.factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory). Note, even though this method is called "combine", this method's contract *does* allow for mutation of the input objects. Thus, any use of lhs or rhs after calling this method is highly discouraged.- Specified by:
combinein classAggregatorFactory- Parameters:
lhs- The left hand side of the combinerhs- The right hand side of the combine- Returns:
- an object representing the combination of lhs and rhs, this can be a new object or a mutation of the inputs
-
getCombiningFactory
public AggregatorFactory getCombiningFactory()
Description copied from class:AggregatorFactoryReturns an AggregatorFactory that can be used to combine the output of aggregators from this factory. It is used when we know we have some values that were produced with this aggregator factory, and want to do some additional combining of them. This happens, for example, when merging query results from two different segments, or two different servers. For simple aggregators, the combining factory may be computed by simply creating a new factory that is the same as the current, except with its input column renamed to the same as the output column. For example, this aggregator: {"type": "longSum", "fieldName": "foo", "name": "bar"} Would become: {"type": "longSum", "fieldName": "bar", "name": "bar"} Sometimes, the type or other parameters of the combining aggregator will be different from the original aggregator. For example, theCountAggregatorFactorygetCombiningFactory method will return aLongSumAggregatorFactory, because counts are combined by summing. No matter what, `foo.getCombiningFactory()` and `foo.getCombiningFactory().getCombiningFactory()` should return the same result.- Specified by:
getCombiningFactoryin classAggregatorFactory- Returns:
- a new Factory that can be used for operations on top of data output from the current factory.
-
deserialize
public Object deserialize(Object object)
Description copied from class:AggregatorFactoryA method that knows how to "deserialize" the object from whatever form it might have been put into in order to transfer via JSON.- Specified by:
deserializein classAggregatorFactory- Parameters:
object- the object to deserialize- Returns:
- the deserialized object
-
finalizeComputation
@Nullable public Object finalizeComputation(@Nullable Object object)
Description copied from class:AggregatorFactory"Finalizes" the computation of an object. Primarily useful for complex types that have a different mergeable intermediate format than their final resultant output.- Specified by:
finalizeComputationin classAggregatorFactory- Parameters:
object- the object to be finalized- Returns:
- the finalized value that should be returned for the initial query
-
requiredFields
public List<String> requiredFields()
Description copied from class:AggregatorFactoryGet a list of fields that aggregators built by this factory will need to read.- Specified by:
requiredFieldsin classAggregatorFactory
-
getIntermediateType
public ColumnType getIntermediateType()
Description copied from class:AggregatorFactoryGet the "intermediate"ColumnTypefor this aggregator. This is the same as the type returned byAggregatorFactory.deserialize(java.lang.Object)and the type accepted byAggregatorFactory.combine(java.lang.Object, java.lang.Object). However, it is *not* necessarily the same type returned byAggregatorFactory.finalizeComputation(java.lang.Object). Refer to theColumnTypejavadocs for details on the implications of choosing a type.- Overrides:
getIntermediateTypein classAggregatorFactory
-
getResultType
public ColumnType getResultType()
Description copied from class:AggregatorFactoryGet theColumnTypefor the final form of this aggregator, i.e. the type of the value returned byAggregatorFactory.finalizeComputation(java.lang.Object). This may be the same as or different than the types expected inAggregatorFactory.deserialize(java.lang.Object)andAggregatorFactory.combine(java.lang.Object, java.lang.Object). Refer to theColumnTypejavadocs for details on the implications of choosing a type.- Overrides:
getResultTypein classAggregatorFactory
-
getMaxIntermediateSize
public int getMaxIntermediateSize()
Description copied from class:AggregatorFactoryReturns the maximum size that this aggregator will require in bytes for intermediate storage of results.- Specified by:
getMaxIntermediateSizein classAggregatorFactory- Returns:
- the maximum number of bytes that an aggregator of this type will require for intermediate result storage.
-
withName
public AggregatorFactory withName(String newName)
Description copied from class:AggregatorFactoryUsed in cases where we want to change the output name of the aggregator to something else. For eg: if we have a query `select a, sum(b) as total group by a from table` the aggregator returned from the native group by query is "a0" set inorg.apache.druid.sql.calcite.rel.DruidQuery#computeAggregations. We can use withName("total") to set the output name of the aggregator to "total".As all implementations of this interface method may not exist, callers of this method are advised to handle such a case.
- Overrides:
withNamein classAggregatorFactory- Parameters:
newName- newName of the output for aggregator factory- Returns:
- AggregatorFactory with the output name set as the input param.
-
getCacheKey
public byte[] getCacheKey()
Description copied from interface:CacheableGet a byte array used as a cache key.- Returns:
- a cache key
-
-