Class GroupingAggregatorFactory
- java.lang.Object
-
- org.apache.druid.query.aggregation.AggregatorFactory
-
- org.apache.druid.query.aggregation.GroupingAggregatorFactory
-
- All Implemented Interfaces:
Cacheable
@EverythingIsNonnullByDefault public class GroupingAggregatorFactory extends AggregatorFactory
This class implementsgrouping
function to determine the grouping that a row is part of. Different result rows for a query could have different grouping columns when subtotals are used. This aggregator factory takes following arguments -name
- Name of aggregators -groupings
- List of dimensions that the user is interested in tracking -keyDimensions
- The list of grouping dimensions being included in the result row. This list is a subset ofgroupings
. This argument cannot be passed by the user. It is set by druid engine when a particular subtotal spec is being processed. Whenever druid engine processes a new subtotal spec, engine sets that subtotal spec as newkeyDimensions
. When key dimensions are updated,value
is updated as well. How the value is determined is captured atgroupingId(List, Set)
. since grouping has to be calculated only once, it could have been implemented as a virtual function or post-aggregator etc. We modelled it as an aggregation operator so that its output can be used in a post-aggregator. Calcite too models grouping function as an aggregation operator. Since it is a non-trivial special aggregation, implementing it required changes in core druid engine to work. There were few approaches. We chose the approach that required least changes in core druid. Refer to https://github.com/apache/druid/pull/10518#discussion_r532941216 for more details. Currently, it works in following way - On data servers (no change), - this factory generatesLongConstantAggregator
/LongConstantBufferAggregator
/LongConstantVectorAggregator
with keyDimensions as null - The aggregators don't actually aggregate anything and their result is not actually used. We could have removed these aggregators on data servers but that would result in a signature mismatch on broker and data nodes. That requires extra handling and is error-prone. - On brokers - Results from data node is already being re-processed for each subtotal spec. We made modifications in this path to update the grouping id for each row.
-
-
Constructor Summary
Constructors Constructor Description GroupingAggregatorFactory(String name, List<String> groupings)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
canVectorize(ColumnInspector columnInspector)
Returns whether or not this aggregation class supports vectorization.Object
combine(Object lhs, Object rhs)
A method that knows how to combine the outputs ofAggregator.get()
produced viaAggregatorFactory.factorize(org.apache.druid.segment.ColumnSelectorFactory)
orBufferAggregator.get(java.nio.ByteBuffer, int)
produced viaAggregatorFactory.factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory)
.Object
deserialize(Object object)
A method that knows how to "deserialize" the object from whatever form it might have been put into in order to transfer via JSON.boolean
equals(Object o)
Aggregator
factorize(ColumnSelectorFactory metricFactory)
BufferAggregator
factorizeBuffered(ColumnSelectorFactory metricFactory)
VectorAggregator
factorizeVector(VectorColumnSelectorFactory selectorFactory)
Create a VectorAggregator based on the provided column selector factory.Object
finalizeComputation(Object object)
"Finalizes" the computation of an object.byte[]
getCacheKey()
Get a byte array used as a cache key.AggregatorFactory
getCombiningFactory()
Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory.Comparator
getComparator()
List<String>
getGroupings()
ColumnType
getIntermediateType()
Get the "intermediate"ColumnType
for this aggregator.int
getMaxIntermediateSize()
Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.String
getName()
ColumnType
getResultType()
Get theColumnType
for the final form of this aggregator, i.e.long
getValue()
int
hashCode()
List<String>
requiredFields()
Get a list of fields that aggregators built by this factory will need to read.String
toString()
GroupingAggregatorFactory
withKeyDimensions(Set<String> newKeyDimensions)
Replace the paramkeyDimensions
with the new set of key dimensionsAggregatorFactory
withName(String newName)
Used in cases where we want to change the output name of the aggregator to something else.-
Methods inherited from class org.apache.druid.query.aggregation.AggregatorFactory
factorizeWithSize, getComplexTypeName, getFinalizedType, getMaxIntermediateSizeWithNulls, getMergingFactory, getRequiredColumns, getType, guessAggregatorHeapFootprint, makeAggregateCombiner, makeNullableAggregateCombiner, mergeAggregators, optimizeForSegment
-
-
-
-
Method Detail
-
factorize
public Aggregator factorize(ColumnSelectorFactory metricFactory)
- Specified by:
factorize
in classAggregatorFactory
-
factorizeBuffered
public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
- Specified by:
factorizeBuffered
in classAggregatorFactory
-
factorizeVector
public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
Description copied from class:AggregatorFactory
Create a VectorAggregator based on the provided column selector factory. Will throw an exception if this aggregation class does not support vectorization: check "canVectorize" first.- Overrides:
factorizeVector
in classAggregatorFactory
-
canVectorize
public boolean canVectorize(ColumnInspector columnInspector)
Description copied from class:AggregatorFactory
Returns whether or not this aggregation class supports vectorization. The default implementation returns false.- Overrides:
canVectorize
in classAggregatorFactory
-
withKeyDimensions
public GroupingAggregatorFactory withKeyDimensions(Set<String> newKeyDimensions)
Replace the paramkeyDimensions
with the new set of key dimensions
-
getComparator
public Comparator getComparator()
- Specified by:
getComparator
in classAggregatorFactory
-
getName
public String getName()
- Specified by:
getName
in classAggregatorFactory
- Returns:
- output name of the aggregator column.
-
getValue
public long getValue()
-
combine
@Nullable public Object combine(@Nullable Object lhs, @Nullable Object rhs)
Description copied from class:AggregatorFactory
A method that knows how to combine the outputs ofAggregator.get()
produced viaAggregatorFactory.factorize(org.apache.druid.segment.ColumnSelectorFactory)
orBufferAggregator.get(java.nio.ByteBuffer, int)
produced viaAggregatorFactory.factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory)
. Note, even though this method is called "combine", this method's contract *does* allow for mutation of the input objects. Thus, any use of lhs or rhs after calling this method is highly discouraged.- Specified by:
combine
in classAggregatorFactory
- Parameters:
lhs
- The left hand side of the combinerhs
- The right hand side of the combine- Returns:
- an object representing the combination of lhs and rhs, this can be a new object or a mutation of the inputs
-
getCombiningFactory
public AggregatorFactory getCombiningFactory()
Description copied from class:AggregatorFactory
Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory. It is used when we know we have some values that were produced with this aggregator factory, and want to do some additional combining of them. This happens, for example, when merging query results from two different segments, or two different servers. For simple aggregators, the combining factory may be computed by simply creating a new factory that is the same as the current, except with its input column renamed to the same as the output column. For example, this aggregator: {"type": "longSum", "fieldName": "foo", "name": "bar"} Would become: {"type": "longSum", "fieldName": "bar", "name": "bar"} Sometimes, the type or other parameters of the combining aggregator will be different from the original aggregator. For example, theCountAggregatorFactory
getCombiningFactory method will return aLongSumAggregatorFactory
, because counts are combined by summing. No matter what, `foo.getCombiningFactory()` and `foo.getCombiningFactory().getCombiningFactory()` should return the same result.- Specified by:
getCombiningFactory
in classAggregatorFactory
- Returns:
- a new Factory that can be used for operations on top of data output from the current factory.
-
deserialize
public Object deserialize(Object object)
Description copied from class:AggregatorFactory
A method that knows how to "deserialize" the object from whatever form it might have been put into in order to transfer via JSON.- Specified by:
deserialize
in classAggregatorFactory
- Parameters:
object
- the object to deserialize- Returns:
- the deserialized object
-
finalizeComputation
@Nullable public Object finalizeComputation(@Nullable Object object)
Description copied from class:AggregatorFactory
"Finalizes" the computation of an object. Primarily useful for complex types that have a different mergeable intermediate format than their final resultant output.- Specified by:
finalizeComputation
in classAggregatorFactory
- Parameters:
object
- the object to be finalized- Returns:
- the finalized value that should be returned for the initial query
-
requiredFields
public List<String> requiredFields()
Description copied from class:AggregatorFactory
Get a list of fields that aggregators built by this factory will need to read.- Specified by:
requiredFields
in classAggregatorFactory
-
getIntermediateType
public ColumnType getIntermediateType()
Description copied from class:AggregatorFactory
Get the "intermediate"ColumnType
for this aggregator. This is the same as the type returned byAggregatorFactory.deserialize(java.lang.Object)
and the type accepted byAggregatorFactory.combine(java.lang.Object, java.lang.Object)
. However, it is *not* necessarily the same type returned byAggregatorFactory.finalizeComputation(java.lang.Object)
. Refer to theColumnType
javadocs for details on the implications of choosing a type.- Overrides:
getIntermediateType
in classAggregatorFactory
-
getResultType
public ColumnType getResultType()
Description copied from class:AggregatorFactory
Get theColumnType
for the final form of this aggregator, i.e. the type of the value returned byAggregatorFactory.finalizeComputation(java.lang.Object)
. This may be the same as or different than the types expected inAggregatorFactory.deserialize(java.lang.Object)
andAggregatorFactory.combine(java.lang.Object, java.lang.Object)
. Refer to theColumnType
javadocs for details on the implications of choosing a type.- Overrides:
getResultType
in classAggregatorFactory
-
getMaxIntermediateSize
public int getMaxIntermediateSize()
Description copied from class:AggregatorFactory
Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.- Specified by:
getMaxIntermediateSize
in classAggregatorFactory
- Returns:
- the maximum number of bytes that an aggregator of this type will require for intermediate result storage.
-
withName
public AggregatorFactory withName(String newName)
Description copied from class:AggregatorFactory
Used in cases where we want to change the output name of the aggregator to something else. For eg: if we have a query `select a, sum(b) as total group by a from table` the aggregator returned from the native group by query is "a0" set inorg.apache.druid.sql.calcite.rel.DruidQuery#computeAggregations
. We can use withName("total") to set the output name of the aggregator to "total".As all implementations of this interface method may not exist, callers of this method are advised to handle such a case.
- Overrides:
withName
in classAggregatorFactory
- Parameters:
newName
- newName of the output for aggregator factory- Returns:
- AggregatorFactory with the output name set as the input param.
-
getCacheKey
public byte[] getCacheKey()
Description copied from interface:Cacheable
Get a byte array used as a cache key.- Returns:
- a cache key
-
-