GroupingAggregatorFactory (druid-processing 27.0.0 API)

java.lang.Object
- org.apache.druid.query.aggregation.AggregatorFactory
- - org.apache.druid.query.aggregation.GroupingAggregatorFactory

All Implemented Interfaces:

Cacheable
```
@EverythingIsNonnullByDefault
public class GroupingAggregatorFactory
extends AggregatorFactory
```
This class implements grouping function to determine the grouping that a row is part of. Different result rows for a query could have different grouping columns when subtotals are used. This aggregator factory takes following arguments - name - Name of aggregators - groupings - List of dimensions that the user is interested in tracking - keyDimensions - The list of grouping dimensions being included in the result row. This list is a subset of groupings. This argument cannot be passed by the user. It is set by druid engine when a particular subtotal spec is being processed. Whenever druid engine processes a new subtotal spec, engine sets that subtotal spec as new keyDimensions. When key dimensions are updated, value is updated as well. How the value is determined is captured at groupingId(List, Set). since grouping has to be calculated only once, it could have been implemented as a virtual function or post-aggregator etc. We modelled it as an aggregation operator so that its output can be used in a post-aggregator. Calcite too models grouping function as an aggregation operator. Since it is a non-trivial special aggregation, implementing it required changes in core druid engine to work. There were few approaches. We chose the approach that required least changes in core druid. Refer to https://github.com/apache/druid/pull/10518#discussion_r532941216 for more details. Currently, it works in following way - On data servers (no change), - this factory generates LongConstantAggregator / LongConstantBufferAggregator / LongConstantVectorAggregator with keyDimensions as null - The aggregators don't actually aggregate anything and their result is not actually used. We could have removed these aggregators on data servers but that would result in a signature mismatch on broker and data nodes. That requires extra handling and is error-prone. - On brokers - Results from data node is already being re-processed for each subtotal spec. We made modifications in this path to update the grouping id for each row.

Constructor Summary

Constructors
Constructor and Description

GroupingAggregatorFactory(String name, List<String> groupings)

Constructors
Constructor and Description
`GroupingAggregatorFactory(String name, List<String> groupings)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`boolean`	`canVectorize(ColumnInspector columnInspector)` Returns whether or not this aggregation class supports vectorization.
`Object`	`combine(Object lhs, Object rhs)` A method that knows how to combine the outputs of `Aggregator.get()` produced via `AggregatorFactory.factorize(org.apache.druid.segment.ColumnSelectorFactory)` or `BufferAggregator.get(java.nio.ByteBuffer, int)` produced via `AggregatorFactory.factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory)`.
`Object`	`deserialize(Object object)` A method that knows how to "deserialize" the object from whatever form it might have been put into in order to transfer via JSON.
`boolean`	`equals(Object o)`
`Aggregator`	`factorize(ColumnSelectorFactory metricFactory)`
`BufferAggregator`	`factorizeBuffered(ColumnSelectorFactory metricFactory)`
`VectorAggregator`	`factorizeVector(VectorColumnSelectorFactory selectorFactory)` Create a VectorAggregator based on the provided column selector factory.
`Object`	`finalizeComputation(Object object)` "Finalizes" the computation of an object.
`byte[]`	`getCacheKey()` Get a byte array used as a cache key.
`AggregatorFactory`	`getCombiningFactory()` Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory.
`Comparator`	`getComparator()`
`List<String>`	`getGroupings()`
`ColumnType`	`getIntermediateType()` Get the "intermediate" `ColumnType` for this aggregator.
`int`	`getMaxIntermediateSize()` Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.
`String`	`getName()`
`List<AggregatorFactory>`	`getRequiredColumns()` Used by `GroupByStrategyV1` when running nested groupBys, to "transfer" values from this aggreagtor to an incremental index that the outer query will run on.
`ColumnType`	`getResultType()` Get the `ColumnType` for the final form of this aggregator, i.e.
`long`	`getValue()`
`int`	`hashCode()`
`List<String>`	`requiredFields()` Get a list of fields that aggregators built by this factory will need to read.
`String`	`toString()`
`GroupingAggregatorFactory`	`withKeyDimensions(Set<String> newKeyDimensions)` Replace the param `keyDimensions` with the new set of key dimensions
`AggregatorFactory`	`withName(String newName)` Used in cases where we want to change the output name of the aggregator to something else.

Methods inherited from class org.apache.druid.query.aggregation.AggregatorFactory
factorizeWithSize, getComplexTypeName, getFinalizedType, getMaxIntermediateSizeWithNulls, getMergingFactory, getType, guessAggregatorHeapFootprint, makeAggregateCombiner, makeNullableAggregateCombiner, mergeAggregators, optimizeForSegment

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - GroupingAggregatorFactory
```
public GroupingAggregatorFactory(String name,
                                 List<String> groupings)
```
- Method Detail
  - factorize
```
public Aggregator factorize(ColumnSelectorFactory metricFactory)
```
    Specified by:
    
    factorize in class AggregatorFactory
  - factorizeBuffered
```
public BufferAggregator factorizeBuffered(ColumnSelectorFactory metricFactory)
```
    Specified by:
    
    factorizeBuffered in class AggregatorFactory
  - factorizeVector
```
public VectorAggregator factorizeVector(VectorColumnSelectorFactory selectorFactory)
```
    Description copied from class: AggregatorFactory
    
    Create a VectorAggregator based on the provided column selector factory. Will throw an exception if this aggregation class does not support vectorization: check "canVectorize" first.
    
    Overrides:
    
    factorizeVector in class AggregatorFactory
  - canVectorize
```
public boolean canVectorize(ColumnInspector columnInspector)
```
    Description copied from class: AggregatorFactory
    
    Returns whether or not this aggregation class supports vectorization. The default implementation returns false.
    
    Overrides:
    
    canVectorize in class AggregatorFactory
  - withKeyDimensions
```
public GroupingAggregatorFactory withKeyDimensions(Set<String> newKeyDimensions)
```
    Replace the param keyDimensions with the new set of key dimensions
  - getComparator
```
public Comparator getComparator()
```
    Specified by:
    
    getComparator in class AggregatorFactory
  - getGroupings
```
public List<String> getGroupings()
```
  - getName
```
public String getName()
```
    Specified by:
    
    getName in class AggregatorFactory
    
    Returns:
    
    output name of the aggregator column.
  - getValue
```
public long getValue()
```
  - combine
```
@Nullable
public Object combine(@Nullable
                                Object lhs,
                                @Nullable
                                Object rhs)
```
    Description copied from class: AggregatorFactory
    
    A method that knows how to combine the outputs of Aggregator.get() produced via AggregatorFactory.factorize(org.apache.druid.segment.ColumnSelectorFactory) or BufferAggregator.get(java.nio.ByteBuffer, int) produced via AggregatorFactory.factorizeBuffered(org.apache.druid.segment.ColumnSelectorFactory). Note, even though this method is called "combine", this method's contract *does* allow for mutation of the input objects. Thus, any use of lhs or rhs after calling this method is highly discouraged.
    
    Specified by:
    
    combine in class AggregatorFactory
    
    Parameters:
    
    lhs - The left hand side of the combine
    
    rhs - The right hand side of the combine
    
    Returns:
    
    an object representing the combination of lhs and rhs, this can be a new object or a mutation of the inputs
  - getCombiningFactory
```
public AggregatorFactory getCombiningFactory()
```
    Description copied from class: AggregatorFactory
    
    Returns an AggregatorFactory that can be used to combine the output of aggregators from this factory. It is used when we know we have some values that were produced with this aggregator factory, and want to do some additional combining of them. This happens, for example, when merging query results from two different segments, or two different servers. For simple aggregators, the combining factory may be computed by simply creating a new factory that is the same as the current, except with its input column renamed to the same as the output column. For example, this aggregator: {"type": "longSum", "fieldName": "foo", "name": "bar"} Would become: {"type": "longSum", "fieldName": "bar", "name": "bar"} Sometimes, the type or other parameters of the combining aggregator will be different from the original aggregator. For example, the CountAggregatorFactory getCombiningFactory method will return a LongSumAggregatorFactory, because counts are combined by summing. No matter what, `foo.getCombiningFactory()` and `foo.getCombiningFactory().getCombiningFactory()` should return the same result.
    
    Specified by:
    
    getCombiningFactory in class AggregatorFactory
    
    Returns:
    
    a new Factory that can be used for operations on top of data output from the current factory.
  - getRequiredColumns
```
public List<AggregatorFactory> getRequiredColumns()
```
    Description copied from class: AggregatorFactory
    
    Used by GroupByStrategyV1 when running nested groupBys, to "transfer" values from this aggreagtor to an incremental index that the outer query will run on. This method only exists due to the design of GroupByStrategyV1, and should probably not be used for anything else. If you are here because you are looking for a way to get the input fields required by this aggregator, and thought "getRequiredColumns" sounded right, please use AggregatorFactory.requiredFields() instead.
    
    Specified by:
    
    getRequiredColumns in class AggregatorFactory
    
    Returns:
    
    AggregatorFactories that can be used to "transfer" values from this aggregator into an incremental index
    
    See Also:
    
    a similarly-named method that is perhaps the one you want instead.
  - deserialize
```
public Object deserialize(Object object)
```
    Description copied from class: AggregatorFactory
    
    A method that knows how to "deserialize" the object from whatever form it might have been put into in order to transfer via JSON.
    
    Specified by:
    
    deserialize in class AggregatorFactory
    
    Parameters:
    
    object - the object to deserialize
    
    Returns:
    
    the deserialized object
  - finalizeComputation
```
@Nullable
public Object finalizeComputation(@Nullable
                                            Object object)
```
    Description copied from class: AggregatorFactory
    
    "Finalizes" the computation of an object. Primarily useful for complex types that have a different mergeable intermediate format than their final resultant output.
    
    Specified by:
    
    finalizeComputation in class AggregatorFactory
    
    Parameters:
    
    object - the object to be finalized
    
    Returns:
    
    the finalized value that should be returned for the initial query
  - requiredFields
```
public List<String> requiredFields()
```
    Description copied from class: AggregatorFactory
    
    Get a list of fields that aggregators built by this factory will need to read.
    
    Specified by:
    
    requiredFields in class AggregatorFactory
  - getIntermediateType
```
public ColumnType getIntermediateType()
```
    Description copied from class: AggregatorFactory
    
    Get the "intermediate" ColumnType for this aggregator. This is the same as the type returned by AggregatorFactory.deserialize(java.lang.Object) and the type accepted by AggregatorFactory.combine(java.lang.Object, java.lang.Object). However, it is *not* necessarily the same type returned by AggregatorFactory.finalizeComputation(java.lang.Object). Refer to the ColumnType javadocs for details on the implications of choosing a type.
    
    Overrides:
    
    getIntermediateType in class AggregatorFactory
  - getResultType
```
public ColumnType getResultType()
```
    Description copied from class: AggregatorFactory
    
    Get the ColumnType for the final form of this aggregator, i.e. the type of the value returned by AggregatorFactory.finalizeComputation(java.lang.Object). This may be the same as or different than the types expected in AggregatorFactory.deserialize(java.lang.Object) and AggregatorFactory.combine(java.lang.Object, java.lang.Object). Refer to the ColumnType javadocs for details on the implications of choosing a type.
    
    Overrides:
    
    getResultType in class AggregatorFactory
  - getMaxIntermediateSize
```
public int getMaxIntermediateSize()
```
    Description copied from class: AggregatorFactory
    
    Returns the maximum size that this aggregator will require in bytes for intermediate storage of results.
    
    Specified by:
    
    getMaxIntermediateSize in class AggregatorFactory
    
    Returns:
    
    the maximum number of bytes that an aggregator of this type will require for intermediate result storage.
  - withName
```
public AggregatorFactory withName(String newName)
```
    Description copied from class: AggregatorFactory
    
    Used in cases where we want to change the output name of the aggregator to something else. For eg: if we have a query `select a, sum(b) as total group by a from table` the aggregator returned from the native group by query is "a0" set in org.apache.druid.sql.calcite.rel.DruidQuery#computeAggregations. We can use withName("total") to set the output name of the aggregator to "total".
    As all implementations of this interface method may not exist, callers of this method are advised to handle such a case.
    
    Overrides:
    
    withName in class AggregatorFactory
    
    Parameters:
    
    newName - newName of the output for aggregator factory
    
    Returns:
    
    AggregatorFactory with the output name set as the input param.
  - getCacheKey
```
public byte[] getCacheKey()
```
    Description copied from interface: Cacheable
    
    Get a byte array used as a cache key.
    
    Returns:
    
    a cache key
  - equals
```
public boolean equals(Object o)
```
    Overrides:
    
    equals in class Object
  - hashCode
```
public int hashCode()
```
    Overrides:
    
    hashCode in class Object
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object

Class GroupingAggregatorFactory

Constructor Summary

Method Summary

Methods inherited from class org.apache.druid.query.aggregation.AggregatorFactory

Methods inherited from class java.lang.Object

Constructor Detail

GroupingAggregatorFactory

Method Detail

factorize

factorizeBuffered

factorizeVector

canVectorize

withKeyDimensions

getComparator

getGroupings

getName

getValue

combine

getCombiningFactory

getRequiredColumns

deserialize

finalizeComputation

requiredFields

getIntermediateType

getResultType

getMaxIntermediateSize

withName

getCacheKey

equals

hashCode

toString