K - type of input and output keysInputT - type of input valuesOutputT - type of output valuespublic static class Combine.GroupedValues<K,InputT,OutputT> extends PTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>
GroupedValues<K, InputT, OutputT> takes a
PCollection<KV<K, Iterable<InputT>>>, such as the result of
GroupByKey, applies a specified
KeyedCombineFn<K, InputT, AccumT, OutputT>
to each of the input KV<K, Iterable<InputT>> elements to
produce a combined output KV<K, OutputT> element, and returns a
PCollection<KV<K, OutputT>> containing all the combined output
elements. It is common for InputT == OutputT, but not required.
Common combining functions include sums, mins, maxes, and averages
of numbers, conjunctions and disjunctions of booleans, statistical
aggregations, etc.
Example of use:
PCollection<KV<String, Integer>> pc = ...;
PCollection<KV<String, Iterable<Integer>>> groupedByKey = pc.apply(
new GroupByKey<String, Integer>());
PCollection<KV<String, Integer>> sumByKey = groupedByKey.apply(
Combine.<String, Integer>groupedValues(
new Sum.SumIntegerFn()));
See also Combine.perKey(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.PerKey, which
captures the common pattern of "combining by key" in a
single easy-to-use PTransform.
Combining for different keys can happen in parallel. Moreover,
combining of the Iterable<InputT> values associated a single
key can happen in parallel, with different subsets of the values
being combined separately, and their intermediate results combined
further, in an arbitrary tree reduction pattern, until a single
result value is produced for each key.
By default, the Coder of the keys of the output
PCollection<KV<K, OutputT>> is that of the keys of the input
PCollection<KV<K, InputT>>, and the Coder of the values
of the output PCollection<KV<K, OutputT>> is inferred from the
concrete type of the KeyedCombineFn<K, InputT, AccumT, OutputT>'s output
type OutputT.
Each output element has the same timestamp and is in the same window
as its corresponding input element, and the output
PCollection has the same
WindowFn
associated with it as the input.
See also Combine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>)/Combine.Globally, which
combines all the values in a PCollection into a
single value in a PCollection.
name| Modifier and Type | Method and Description |
|---|---|
PCollection<KV<K,OutputT>> |
apply(PCollection<? extends KV<K,? extends Iterable<InputT>>> input)
Applies this
PTransform on the given InputT, and returns its
Output. |
AppliedCombineFn<? super K,? super InputT,?,OutputT> |
getAppliedFn(CoderRegistry registry,
Coder<? extends KV<K,? extends Iterable<InputT>>> inputCoder) |
Coder<KV<K,OutputT>> |
getDefaultOutputCoder(PCollection<? extends KV<K,? extends Iterable<InputT>>> input)
Returns the default
Coder to use for the output of this
single-output PTransform when applied to the given input. |
Combine.KeyedCombineFn<? super K,? super InputT,?,OutputT> |
getFn()
Returns the KeyedCombineFn used by this Combine operation.
|
getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, toString, validatepublic Combine.KeyedCombineFn<? super K,? super InputT,?,OutputT> getFn()
public PCollection<KV<K,OutputT>> apply(PCollection<? extends KV<K,? extends Iterable<InputT>>> input)
PTransformPTransform on the given InputT, and returns its
Output.
Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
The default implementation throws an exception. A derived class must
either implement apply, or else each runner must supply a custom
implementation via
PipelineRunner.apply(com.google.cloud.dataflow.sdk.transforms.PTransform<InputT, OutputT>, InputT).
apply in class PTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>public AppliedCombineFn<? super K,? super InputT,?,OutputT> getAppliedFn(CoderRegistry registry, Coder<? extends KV<K,? extends Iterable<InputT>>> inputCoder)
public Coder<KV<K,OutputT>> getDefaultOutputCoder(PCollection<? extends KV<K,? extends Iterable<InputT>>> input) throws CannotProvideCoderException
PTransformCoder to use for the output of this
single-output PTransform when applied to the given input.getDefaultOutputCoder in class PTransform<PCollection<? extends KV<K,? extends Iterable<InputT>>>,PCollection<KV<K,OutputT>>>CannotProvideCoderException - if none can be inferred.
By default, always throws.