Annotation Type DoFn.ProcessElement


  • @Documented
    @Retention(RUNTIME)
    @Target(METHOD)
    public static @interface DoFn.ProcessElement
    Annotation for the method to use for processing elements. A subclass of DoFn must have a method with this annotation.

    If any of the arguments is a RestrictionTracker then see the specifications below about splittable DoFn, otherwise this method must satisfy the following constraints:

    • If one of its arguments is tagged with the DoFn.Element annotation, then it will be passed the current element being processed. The argument type must match the input type of this DoFn exactly, or both types must have equivalent schemas registered.
    • If one of its arguments is tagged with the DoFn.Timestamp annotation, then it will be passed the timestamp of the current element being processed; the argument must be of type Instant.
    • If one of its arguments is a subtype of BoundedWindow, then it will be passed the window of the current element. When applied by ParDo the subtype of BoundedWindow must match the type of windows on the input PCollection. If the window is not accessed a runner may perform additional optimizations.
    • If one of its arguments is of type PaneInfo, then it will be passed information about the current triggering pane.
    • If one of the parameters is of type PipelineOptions, then it will be passed the options for the current pipeline.
    • If one of the parameters is of type DoFn.OutputReceiver, then it will be passed an output receiver for outputting elements to the default output.
    • If one of the parameters is of type DoFn.MultiOutputReceiver, then it will be passed an output receiver for outputting to multiple tagged outputs.
    • If one of the parameters is of type DoFn.BundleFinalizer, then it will be passed a mechanism to register a callback that will be invoked after the runner successfully commits the output of this bundle. See Apache Beam Portability API: How to Finalize Bundles for further details.
    • It must return void.

    Splittable DoFn's

    A DoFn is splittable if its DoFn.ProcessElement method has a parameter whose type is of RestrictionTracker. This is an advanced feature and an overwhelming majority of users will never need to write a splittable DoFn.

    Not all runners support Splittable DoFn. See the capability matrix.

    See the proposal for an overview of the involved concepts (splittable DoFn, restriction, restriction tracker).

    A splittable DoFn must obey the following constraints:

    If this DoFn is splittable, this method must satisfy the following constraints:

    • One of its arguments must be a RestrictionTracker. The argument must be of the exact type RestrictionTracker<RestrictionT, PositionT>.
    • If one of its arguments is tagged with the DoFn.Element annotation, then it will be passed the current element being processed. The argument type must match the input type of this DoFn exactly, or both types must have equivalent schemas registered.
    • If one of its arguments is tagged with the DoFn.Restriction annotation, then it will be passed the current restriction being processed; the argument must be of type RestrictionT.
    • If one of its arguments is tagged with the DoFn.Timestamp annotation, then it will be passed the timestamp of the current element being processed; the argument must be of type Instant.
    • If one of its arguments is of the type WatermarkEstimator, then it will be passed the watermark estimator.
    • If one of its arguments is of the type ManualWatermarkEstimator, then it will be passed a watermark estimator that can be updated manually. This parameter can only be supplied if the method annotated with DoFn.GetInitialWatermarkEstimatorState returns a sub-type of ManualWatermarkEstimator.
    • If one of its arguments is a subtype of BoundedWindow, then it will be passed the window of the current element. When applied by ParDo the subtype of BoundedWindow must match the type of windows on the input PCollection. If the window is not accessed a runner may perform additional optimizations.
    • If one of its arguments is of type PaneInfo, then it will be passed information about the current triggering pane.
    • If one of the parameters is of type PipelineOptions, then it will be passed the options for the current pipeline.
    • If one of the parameters is of type DoFn.OutputReceiver, then it will be passed an output receiver for outputting elements to the default output.
    • If one of the parameters is of type DoFn.MultiOutputReceiver, then it will be passed an output receiver for outputting to multiple tagged outputs.
    • If one of the parameters is of type DoFn.BundleFinalizer, then it will be passed a mechanism to register a callback that will be invoked after the runner successfully commits the output of this bundle. See Apache Beam Portability API: How to Finalize Bundles for further details.
    • May return a DoFn.ProcessContinuation to indicate whether there is more work to be done for the current element, otherwise must return void.