Class Combine.GroupedValues<K,​InputT,​OutputT>

  • Type Parameters:
    K - type of input and output keys
    InputT - type of input values
    OutputT - type of output values
    All Implemented Interfaces:
    java.io.Serializable, HasDisplayData
    Enclosing class:
    Combine

    public static class Combine.GroupedValues<K,​InputT,​OutputT>
    extends PTransform<PCollection<? extends KV<K,​? extends java.lang.Iterable<InputT>>>,​PCollection<KV<K,​OutputT>>>
    GroupedValues<K, InputT, OutputT> takes a PCollection<KV<K, Iterable<InputT>>>, such as the result of GroupByKey, applies a specified CombineFn<InputT, AccumT, OutputT> to each of the input KV<K, Iterable<InputT>> elements to produce a combined output KV<K, OutputT> element, and returns a PCollection<KV<K, OutputT>> containing all the combined output elements. It is common for InputT == OutputT, but not required. Common combining functions include sums, mins, maxes, and averages of numbers, conjunctions and disjunctions of booleans, statistical aggregations, etc.

    Example of use:

    
     PCollection<KV<String, Integer>> pc = ...;
     PCollection<KV<String, Iterable<Integer>>> groupedByKey = pc.apply(
         GroupByKey.create());
     PCollection<KV<String, Integer>> sumByKey =
         groupedByKey.apply(Combine.groupedValues(
             Sum.ofIntegers()));
     

    See also Combine.perKey(org.apache.beam.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.PerKey, which captures the common pattern of "combining by key" in a single easy-to-use PTransform.

    Combining for different keys can happen in parallel. Moreover, combining of the Iterable<InputT> values associated a single key can happen in parallel, with different subsets of the values being combined separately, and their intermediate results combined further, in an arbitrary tree reduction pattern, until a single result value is produced for each key.

    By default, the Coder of the keys of the output PCollection<KV<K, OutputT>> is that of the keys of the input PCollection<KV<K, InputT>>, and the Coder of the values of the output PCollection<KV<K, OutputT>> is inferred from the concrete type of the CombineFn<InputT, AccumT, OutputT>'s output type OutputT.

    Each output element has the same timestamp and is in the same window as its corresponding input element, and the output PCollection has the same WindowFn associated with it as the input.

    See also Combine.globally(org.apache.beam.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.Globally, which combines all the values in a PCollection into a single value in a PCollection.

    See Also:
    Serialized Form