Class StreamingMergeSortedGrouper<KeyType>

  • All Implemented Interfaces:
    Closeable, AutoCloseable, Grouper<KeyType>

    public class StreamingMergeSortedGrouper<KeyType>
    extends Object
    implements Grouper<KeyType>
    A streaming grouper which can aggregate sorted inputs. This grouper can aggregate while its iterator is being consumed. The aggregation thread and the iterating thread can be different. This grouper is backed by an off-heap circular array. The reading thread is able to read data from an array slot only when aggregation for the grouping key correspoing to that slot is finished. Since the reading and writing threads cannot access the same array slot at the same time, they can read/write data without contention. This class uses the spinlock for waiting for at least one slot to become available when the array is empty or full. If the array is empty, the reading thread waits for the aggregation for an array slot is finished. If the array is full, the writing thread waits for the reading thread to read at least one aggregate from the array.
    • Method Detail

      • requiredBufferCapacity

        public static <KeyType> int requiredBufferCapacity​(Grouper.KeySerde<KeyType> keySerde,
                                                           AggregatorFactory[] aggregatorFactories)
        Returns the minimum buffer capacity required for this grouper. This grouper keeps track read/write indexes and they cannot point the same array slot at the same time. Since the read/write indexes move circularly, one extra slot is needed in addition to the read/write slots. Finally, the required minimum buffer capacity is 3 * record size.
        Returns:
        required minimum buffer capacity
      • isInitialized

        public boolean isInitialized()
        Description copied from interface: Grouper
        Check this grouper is initialized or not.
        Specified by:
        isInitialized in interface Grouper<KeyType>
        Returns:
        true if the grouper is already initialized, otherwise false.
      • aggregate

        public AggregateResult aggregate​(KeyType key,
                                         int notUsed)
        Description copied from interface: Grouper
        Aggregate the current row with the provided key. Some implementations are thread-safe and some are not.
        Specified by:
        aggregate in interface Grouper<KeyType>
        Parameters:
        key - key object
        notUsed - result of Grouper.hashFunction() on the key
        Returns:
        result that is ok if the row was aggregated, not ok if a resource limit was hit
      • aggregate

        public AggregateResult aggregate​(KeyType key)
        Description copied from interface: Grouper
        Aggregate the current row with the provided key. Some implementations are thread-safe and some are not.
        Specified by:
        aggregate in interface Grouper<KeyType>
        Parameters:
        key - key
        Returns:
        result that is ok if the row was aggregated, not ok if a resource limit was hit
      • reset

        public void reset()
        Description copied from interface: Grouper
        Reset the grouper to its initial state.
        Specified by:
        reset in interface Grouper<KeyType>
      • finish

        public void finish()
        Signal that no more inputs are added. Must be called after aggregate(Object) is called for the last input.
      • iterator

        public CloseableIterator<Grouper.Entry<KeyType>> iterator()
        Return a sorted iterator. This method can be called safely while writing, and the iterating thread and the writing thread can be different. The result iterator always returns sorted results. This method should be called only one time per grouper.
        Returns:
        a sorted iterator
      • iterator

        public CloseableIterator<Grouper.Entry<KeyType>> iterator​(boolean sorted)
        Return a sorted iterator. This method can be called safely while writing and iterating thread and writing thread can be different. The result iterator always returns sorted results. This method should be called only one time per grouper.
        Specified by:
        iterator in interface Grouper<KeyType>
        Parameters:
        sorted - not used
        Returns:
        a sorted iterator