Class Partition<T>

  • Type Parameters:
    T - the type of the elements of the input and output PCollections
    All Implemented Interfaces:
    java.io.Serializable, HasDisplayData

    public class Partition<T>
    extends PTransform<PCollection<T>,​PCollectionList<T>>
    Partition takes a PCollection<T> and a PartitionFn, uses the PartitionFn to split the elements of the input PCollection into N partitions, and returns a PCollectionList<T> that bundles N PCollection<T>s containing the split elements.

    Example of use:

    
     PCollection<Student> students = ...;
     // Split students up into 10 partitions, by percentile:
     PCollectionList<Student> studentsByPercentile =
         students.apply(Partition.of(10, new PartitionFn<Student>() {
             public int partitionFor(Student student, int numPartitions) {
                 return student.getPercentile()  // 0..99
                      * numPartitions / 100;
             }}))
     for (int i = 0; i < 10; i++) {
       PCollection<Student> partition = studentsByPercentile.get(i);
       ...
     }
     
    
     PCollection<Student> students = ...;
     // Split students up into 2 partitions, by percentile based on sideView
     PCollectionView<Integer> gradesView =
             pipeline.apply("grades", Create.of(50)).apply(View.asSingleton());
     PCollectionList<Integer> studentsByGrades =
             pipeline.apply(studentsPercentage)
                 .apply(Partition.of(2, ((elem, numPartitions, ctx) -> {
                   Integer grades = ctx.sideInput(gradesView);
                   return elem < grades ? 0 : 1;
                 }),Requirements.requiresSideInputs(gradesView)));
    
       PCollection<Student> below = studentsByPercentile.get(0); // all students who are below 50
       PCollection<Student> above = studentsByPercentile.get(1); // all students who are 50 or above
       ...
     
     }

    By default, the Coder of each of the PCollections in the output PCollectionList is the same as the Coder of the input PCollection.

    Each output element has the same timestamp and is in the same windows as its corresponding input element, and each output PCollection has the same WindowFn associated with it as the input.

    See Also:
    Serialized Form
    • Method Detail

      • of

        public static <T> Partition<T> of​(int numPartitions,
                                          Partition.PartitionWithSideInputsFn<? super T> partitionFn,
                                          Requirements requirements)
        Returns a new Partition PTransform that divides its input PCollection into the given number of partitions, using the given partitioning function.
        Parameters:
        numPartitions - the number of partitions to divide the input PCollection into
        partitionFn - the function to invoke on each element to choose its output partition
        requirements - the Requirements needed to run it.
        Throws:
        java.lang.IllegalArgumentException - if numPartitions <= 0
      • of

        public static <T> Partition<T> of​(int numPartitions,
                                          Partition.PartitionFn<? super T> partitionFn)
        Returns a new Partition PTransform that divides its input PCollection into the given number of partitions, using the given partitioning function.
        Parameters:
        numPartitions - the number of partitions to divide the input PCollection into
        partitionFn - the function to invoke on each element to choose its output partition
        Throws:
        java.lang.IllegalArgumentException - if numPartitions <= 0
      • expand

        public PCollectionList<T> expand​(PCollection<T> in)
        Description copied from class: PTransform
        Override this method to specify how this PTransform should be expanded on the given InputT.

        NOTE: This method should not be called directly. Instead apply the PTransform should be applied to the InputT using the apply method.

        Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).

        Specified by:
        expand in class PTransform<PCollection<T>,​PCollectionList<T>>
      • populateDisplayData

        public void populateDisplayData​(DisplayData.Builder builder)
        Description copied from class: PTransform
        Register display data for the given transform or component.

        populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

        By default, does not register any display data. Implementors may override this method to provide their own display data.

        Specified by:
        populateDisplayData in interface HasDisplayData
        Overrides:
        populateDisplayData in class PTransform<PCollection<T>,​PCollectionList<T>>
        Parameters:
        builder - The builder to populate with display data.
        See Also:
        HasDisplayData