Class GroupingAggregatorFactory

  • All Implemented Interfaces:
    Cacheable

    @EverythingIsNonnullByDefault
    public class GroupingAggregatorFactory
    extends AggregatorFactory
    This class implements grouping function to determine the grouping that a row is part of. Different result rows for a query could have different grouping columns when subtotals are used. This aggregator factory takes following arguments - name - Name of aggregators - groupings - List of dimensions that the user is interested in tracking - keyDimensions - The list of grouping dimensions being included in the result row. This list is a subset of groupings. This argument cannot be passed by the user. It is set by druid engine when a particular subtotal spec is being processed. Whenever druid engine processes a new subtotal spec, engine sets that subtotal spec as new keyDimensions. When key dimensions are updated, value is updated as well. How the value is determined is captured at groupingId(List, Set). since grouping has to be calculated only once, it could have been implemented as a virtual function or post-aggregator etc. We modelled it as an aggregation operator so that its output can be used in a post-aggregator. Calcite too models grouping function as an aggregation operator. Since it is a non-trivial special aggregation, implementing it required changes in core druid engine to work. There were few approaches. We chose the approach that required least changes in core druid. Refer to https://github.com/apache/druid/pull/10518#discussion_r532941216 for more details. Currently, it works in following way - On data servers (no change), - this factory generates LongConstantAggregator / LongConstantBufferAggregator / LongConstantVectorAggregator with keyDimensions as null - The aggregators don't actually aggregate anything and their result is not actually used. We could have removed these aggregators on data servers but that would result in a signature mismatch on broker and data nodes. That requires extra handling and is error-prone. - On brokers - Results from data node is already being re-processed for each subtotal spec. We made modifications in this path to update the grouping id for each row.