Class BufferArrayGrouper

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Grouper<IntKey>, IntGrouper, VectorGrouper

    public class BufferArrayGrouper
    extends Object
    implements VectorGrouper, IntGrouper
    A buffer grouper for array-based aggregation. This grouper stores aggregated values in the buffer using the grouping key as the index.

    The buffer is divided into 2 separate regions, i.e., used flag buffer and value buffer. The used flag buffer is a bit set to represent which keys are valid. If a bit of an index is set, that key is valid. Finally, the value buffer is used to store aggregated values. The first index is reserved for GroupByColumnSelectorStrategy.GROUP_BY_MISSING_VALUE.

    This grouper is available only when the grouping key is a single indexed dimension of a known cardinality because it directly uses the dimension value as the index for array access. Since the cardinality for the grouping key across different segments cannot be currently retrieved, this grouper can be used only when performing per-segment query execution.

    • Constructor Detail

      • BufferArrayGrouper

        public BufferArrayGrouper​(com.google.common.base.Supplier<ByteBuffer> bufferSupplier,
                                  AggregatorAdapters aggregators,
                                  int cardinality)
    • Method Detail

      • requiredBufferCapacity

        public static long requiredBufferCapacity​(int cardinality,
                                                  AggregatorFactory[] aggregatorFactories)
        Computes required buffer capacity for a grouping key of the given cardinaltiy and aggregatorFactories. This method assumes that the given cardinality doesn't count nulls. Returns -1 if cardinality + 1 (for null) > Integer.MAX_VALUE. Returns computed required buffer capacity otherwise.
      • isInitialized

        public boolean isInitialized()
        Description copied from interface: Grouper
        Check this grouper is initialized or not.
        Specified by:
        isInitialized in interface Grouper<IntKey>
        Returns:
        true if the grouper is already initialized, otherwise false.
      • aggregateVector

        public AggregateResult aggregateVector​(org.apache.datasketches.memory.Memory keySpace,
                                               int startRow,
                                               int endRow)
        Description copied from interface: VectorGrouper
        Aggregate the current vector of rows from "startRow" to "endRow" using the provided keys.
        Specified by:
        aggregateVector in interface VectorGrouper
        Parameters:
        keySpace - array holding keys, chunked into ints. First (endRow - startRow) keys must be valid.
        startRow - row to start at (inclusive).
        endRow - row to end at (exclusive).
        Returns:
        result that indicates how many keys were aggregated (may be partial due to resource limits)
      • iterator

        public CloseableIterator<Grouper.Entry<IntKey>> iterator​(boolean sorted)
        Description copied from interface: Grouper
        Iterate through entries.

        Some implementations allow writes even after this method is called. After you are done with the iterator returned by this method, you should either call Grouper.close() (if you are done with the Grouper) or Grouper.reset() (if you want to reuse it). Some implementations allow calling Grouper.iterator(boolean) again if you want another iterator. But, this method must not be called by multiple threads concurrently.

        If "sorted" is true then the iterator will return sorted results. It will use KeyType's natural ordering on deserialized objects, and will use the Grouper.KeySerde.bufferComparator() on serialized objects. Woe be unto you if these comparators are not equivalent.

        Callers must process and discard the returned Grouper.Entrys immediately because some implementations can reuse the key objects.

        Specified by:
        iterator in interface Grouper<IntKey>
        Parameters:
        sorted - return sorted results
        Returns:
        entry iterator