@EverythingIsNonnullByDefault public class GroupingAggregatorFactory extends AggregatorFactory
grouping
function to determine the grouping that a row is part of. Different result rows
for a query could have different grouping columns when subtotals are used.
This aggregator factory takes following arguments
- name
- Name of aggregators
- groupings
- List of dimensions that the user is interested in tracking
- keyDimensions
- The list of grouping dimensions being included in the result row. This list is a subset of
groupings
. This argument cannot be passed by the user. It is set by druid engine
when a particular subtotal spec is being processed. Whenever druid engine processes a new
subtotal spec, engine sets that subtotal spec as new keyDimensions
.
When key dimensions are updated, value
is updated as well. How the value is determined is captured
at groupingId(List, Set)
.
since grouping has to be calculated only once, it could have been implemented as a virtual function or
post-aggregator etc. We modelled it as an aggregation operator so that its output can be used in a post-aggregator.
Calcite too models grouping function as an aggregation operator.
Since it is a non-trivial special aggregation, implementing it required changes in core druid engine to work. There
were few approaches. We chose the approach that required least changes in core druid.
Refer to https://github.com/apache/druid/pull/10518#discussion_r532941216 for more details.
Currently, it works in following way
- On data servers (no change),
- this factory generates LongConstantAggregator
/ LongConstantBufferAggregator
/ LongConstantVectorAggregator
with keyDimensions as null
- The aggregators don't actually aggregate anything and their result is not actually used. We could have removed
these aggregators on data servers but that would result in a signature mismatch on broker and data nodes. That requires
extra handling and is error-prone.
- On brokers
- Results from data node is already being re-processed for each subtotal spec. We made modifications in this path to update the
grouping id for each row.Constructor and Description |
---|
GroupingAggregatorFactory(String name,
List<String> groupings) |
Modifier and Type | Method and Description |
---|---|
boolean |
canVectorize(ColumnInspector columnInspector)
Returns whether or not this aggregation class supports vectorization.
|
Object |
combine(Object lhs,
Object rhs)
A method that knows how to combine the outputs of
Aggregator.get() produced via AggregatorFactory.factorize(org.apache.druid.segment.ColumnSelectorFactory) or BufferAggregator.get(java.nio.ByteBuffer, int) produced via AggregatorFactory.factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory) . |
Object |
deserialize(Object object)
A method that knows how to "deserialize" the object from whatever form it might have been put into
in order to transfer via JSON.
|
boolean |
equals(Object o) |
Aggregator |
factorize(ColumnSelectorFactory metricFactory) |
BufferAggregator |
factorizeBuffered(ColumnSelectorFactory metricFactory) |
VectorAggregator |
factorizeVector(VectorColumnSelectorFactory selectorFactory)
Create a VectorAggregator based on the provided column selector factory.
|
Object |
finalizeComputation(Object object)
"Finalizes" the computation of an object.
|
byte[] |
getCacheKey()
Get a byte array used as a cache key.
|
AggregatorFactory |
getCombiningFactory()
Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory.
|
Comparator |
getComparator() |
List<String> |
getGroupings() |
ColumnType |
getIntermediateType()
Get the "intermediate"
ColumnType for this aggregator. |
int |
getMaxIntermediateSize()
Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.
|
String |
getName() |
List<AggregatorFactory> |
getRequiredColumns()
Used by
GroupByStrategyV1 when running nested groupBys, to
"transfer" values from this aggreagtor to an incremental index that the outer query will run on. |
ColumnType |
getResultType()
Get the
ColumnType for the final form of this aggregator, i.e. |
long |
getValue() |
int |
hashCode() |
List<String> |
requiredFields()
Get a list of fields that aggregators built by this factory will need to read.
|
String |
toString() |
GroupingAggregatorFactory |
withKeyDimensions(Set<String> newKeyDimensions)
Replace the param
keyDimensions with the new set of key dimensions |
AggregatorFactory |
withName(String newName)
Used in cases where we want to change the output name of the aggregator to something else.
|
factorizeWithSize, getComplexTypeName, getFinalizedType, getMaxIntermediateSizeWithNulls, getMergingFactory, getType, guessAggregatorHeapFootprint, makeAggregateCombiner, makeNullableAggregateCombiner, mergeAggregators, optimizeForSegment
public Aggregator factorize(ColumnSelectorFactory metricFactory)
factorize
in class AggregatorFactory
public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
factorizeBuffered
in class AggregatorFactory
public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
AggregatorFactory
factorizeVector
in class AggregatorFactory
public boolean canVectorize(ColumnInspector columnInspector)
AggregatorFactory
canVectorize
in class AggregatorFactory
public GroupingAggregatorFactory withKeyDimensions(Set<String> newKeyDimensions)
keyDimensions
with the new set of key dimensionspublic Comparator getComparator()
getComparator
in class AggregatorFactory
public String getName()
getName
in class AggregatorFactory
public long getValue()
@Nullable public Object combine(@Nullable Object lhs, @Nullable Object rhs)
AggregatorFactory
Aggregator.get()
produced via AggregatorFactory.factorize(org.apache.druid.segment.ColumnSelectorFactory)
or BufferAggregator.get(java.nio.ByteBuffer, int)
produced via AggregatorFactory.factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory)
. Note, even though this method is called "combine",
this method's contract *does* allow for mutation of the input objects. Thus, any use of lhs or rhs after calling
this method is highly discouraged.combine
in class AggregatorFactory
lhs
- The left hand side of the combinerhs
- The right hand side of the combinepublic AggregatorFactory getCombiningFactory()
AggregatorFactory
CountAggregatorFactory
getCombiningFactory method will return a
LongSumAggregatorFactory
, because counts are combined by summing.
No matter what, `foo.getCombiningFactory()` and `foo.getCombiningFactory().getCombiningFactory()` should return
the same result.getCombiningFactory
in class AggregatorFactory
public List<AggregatorFactory> getRequiredColumns()
AggregatorFactory
GroupByStrategyV1
when running nested groupBys, to
"transfer" values from this aggreagtor to an incremental index that the outer query will run on. This method
only exists due to the design of GroupByStrategyV1, and should probably not be used for anything else. If you are
here because you are looking for a way to get the input fields required by this aggregator, and thought
"getRequiredColumns" sounded right, please use AggregatorFactory.requiredFields()
instead.getRequiredColumns
in class AggregatorFactory
a similarly-named method that is perhaps the one you want instead.
public Object deserialize(Object object)
AggregatorFactory
deserialize
in class AggregatorFactory
object
- the object to deserialize@Nullable public Object finalizeComputation(@Nullable Object object)
AggregatorFactory
finalizeComputation
in class AggregatorFactory
object
- the object to be finalizedpublic List<String> requiredFields()
AggregatorFactory
requiredFields
in class AggregatorFactory
public ColumnType getIntermediateType()
AggregatorFactory
ColumnType
for this aggregator. This is the same as the type returned by
AggregatorFactory.deserialize(java.lang.Object)
and the type accepted by AggregatorFactory.combine(java.lang.Object, java.lang.Object)
. However, it is *not* necessarily the same type
returned by AggregatorFactory.finalizeComputation(java.lang.Object)
.
Refer to the ColumnType
javadocs for details on the implications of choosing a type.getIntermediateType
in class AggregatorFactory
public ColumnType getResultType()
AggregatorFactory
ColumnType
for the final form of this aggregator, i.e. the type of the value returned by
AggregatorFactory.finalizeComputation(java.lang.Object)
. This may be the same as or different than the types expected in AggregatorFactory.deserialize(java.lang.Object)
and AggregatorFactory.combine(java.lang.Object, java.lang.Object)
.
Refer to the ColumnType
javadocs for details on the implications of choosing a type.getResultType
in class AggregatorFactory
public int getMaxIntermediateSize()
AggregatorFactory
getMaxIntermediateSize
in class AggregatorFactory
public AggregatorFactory withName(String newName)
AggregatorFactory
org.apache.druid.sql.calcite.rel.DruidQuery#computeAggregations
. We can use withName("total") to set the output name
of the aggregator to "total".
As all implementations of this interface method may not exist, callers of this method are advised to handle such a case.
withName
in class AggregatorFactory
newName
- newName of the output for aggregator factorypublic byte[] getCacheKey()
Cacheable
Copyright © 2011–2023 The Apache Software Foundation. All rights reserved.