Class CoGroupByKey<K>

  • Type Parameters:
    K - the type of the keys in the input and output PCollections
    All Implemented Interfaces:
    java.io.Serializable, HasDisplayData

    public class CoGroupByKey<K>
    extends PTransform<KeyedPCollectionTuple<K>,​PCollection<KV<K,​CoGbkResult>>>
    A PTransform that performs a CoGroupByKey on a tuple of tables. A CoGroupByKey groups results from all tables by like keys into CoGbkResults, from which the results for any specific table can be accessed by the TupleTag supplied with the initial table.

    Example of performing a CoGroupByKey followed by a ParDo that consumes the results:

    
     PCollection<KV<K, V1>> pt1 = ...;
     PCollection<KV<K, V2>> pt2 = ...;
    
     final TupleTag<V1> t1 = new TupleTag<>();
     final TupleTag<V2> t2 = new TupleTag<>();
     PCollection<KV<K, CoGbkResult>> coGbkResultCollection =
       KeyedPCollectionTuple.of(t1, pt1)
                            .and(t2, pt2)
                            .apply(CoGroupByKey.<K>create());
    
     PCollection<T> finalResultCollection =
       coGbkResultCollection.apply(ParDo.of(
         new DoFn<KV<K, CoGbkResult>, T>() {
          @ProcessElement
           public void processElement(ProcessContext c) {
             KV<K, CoGbkResult> e = c.element();
             Iterable<V1> pt1Vals = e.getValue().getAll(t1);
             V2 pt2Val = e.getValue().getOnly(t2);
              ... Do Something ....
             c.output(...some T...);
           }
         }));
     
    See Also:
    Serialized Form
    • Method Detail

      • create

        public static <K> CoGroupByKey<K> create()
        Returns a CoGroupByKey<K> PTransform.
        Type Parameters:
        K - the type of the keys in the input and output PCollections
      • expand

        public PCollection<KV<K,​CoGbkResult>> expand​(KeyedPCollectionTuple<K> input)
        Description copied from class: PTransform
        Override this method to specify how this PTransform should be expanded on the given InputT.

        NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.

        Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).

        Specified by:
        expand in class PTransform<KeyedPCollectionTuple<K>,​PCollection<KV<K,​CoGbkResult>>>