Package org.apache.beam.sdk.transforms
Annotation Type DoFn.ProcessElement
-
@Documented @Retention(RUNTIME) @Target(METHOD) public static @interface DoFn.ProcessElement
Annotation for the method to use for processing elements. A subclass ofDoFn
must have a method with this annotation.If any of the arguments is a
RestrictionTracker
then see the specifications below about splittableDoFn
, otherwise this method must satisfy the following constraints:- If one of its arguments is tagged with the
DoFn.Element
annotation, then it will be passed the current element being processed. The argument type must match the input type of this DoFn exactly, or both types must have equivalent schemas registered. - If one of its arguments is tagged with the
DoFn.Timestamp
annotation, then it will be passed the timestamp of the current element being processed; the argument must be of typeInstant
. - If one of its arguments is a subtype of
BoundedWindow
, then it will be passed the window of the current element. When applied byParDo
the subtype ofBoundedWindow
must match the type of windows on the inputPCollection
. If the window is not accessed a runner may perform additional optimizations. - If one of its arguments is of type
PaneInfo
, then it will be passed information about the current triggering pane. - If one of the parameters is of type
PipelineOptions
, then it will be passed the options for the current pipeline. - If one of the parameters is of type
DoFn.OutputReceiver
, then it will be passed an output receiver for outputting elements to the default output. - If one of the parameters is of type
DoFn.MultiOutputReceiver
, then it will be passed an output receiver for outputting to multiple tagged outputs. - If one of the parameters is of type
DoFn.BundleFinalizer
, then it will be passed a mechanism to register a callback that will be invoked after the runner successfully commits the output of this bundle. See Apache Beam Portability API: How to Finalize Bundles for further details. - It must return
void
.
Splittable DoFn's
A
DoFn
is splittable if itsDoFn.ProcessElement
method has a parameter whose type is ofRestrictionTracker
. This is an advanced feature and an overwhelming majority of users will never need to write a splittableDoFn
.Not all runners support Splittable DoFn. See the capability matrix.
See the proposal for an overview of the involved concepts (splittable DoFn, restriction, restriction tracker).
A splittable
DoFn
must obey the following constraints:- The type of restrictions used by all of these methods must be the same.
- It must define a
DoFn.GetInitialRestriction
method. - It may define a
DoFn.GetSize
method or ensure that theRestrictionTracker
implementsRestrictionTracker.HasProgress
. Poor auto-scaling of workers and/or splitting may result if size or progress is an inaccurate representation of work. SeeDoFn.GetSize
andRestrictionTracker.HasProgress
for further details. - It should define a
DoFn.SplitRestriction
method. This method enables runners to perform bulk splitting initially allowing for a rapid increase in parallelism. If it is not defined, there is no initial split happening by default. Note that initial split is a different concept from the split during element processing time. SeeRestrictionTracker.trySplit(double)
for details about splitting when the current element and restriction are actively being processed. - It may define a
DoFn.TruncateRestriction
method to choose how to truncate a restriction such that it represents a finite amount of work when the pipeline is draining. SeeDoFn.TruncateRestriction
andRestrictionTracker.isBounded()
for additional details. - It may define a
DoFn.NewTracker
method returning a subtype ofRestrictionTracker<R>
whereR
is the restriction type returned byDoFn.GetInitialRestriction
. This method is optional only if the restriction type returned byDoFn.GetInitialRestriction
implementsHasDefaultTracker
. - It may define a
DoFn.GetRestrictionCoder
method. - It may define a
DoFn.GetInitialWatermarkEstimatorState
method. If none is defined then the watermark estimator state is of typeVoid
. - It may define a
DoFn.GetWatermarkEstimatorStateCoder
method. - It may define a
DoFn.NewWatermarkEstimator
method returning a subtype ofWatermarkEstimator<W>
whereW
is the watermark estimator state type returned byDoFn.GetInitialWatermarkEstimatorState
. This method is optional only ifDoFn.GetInitialWatermarkEstimatorState
has not been defined orW
implementsHasDefaultWatermarkEstimator
. - The
DoFn
itself may be annotated withDoFn.BoundedPerElement
orDoFn.UnboundedPerElement
, but not both at the same time. If it's not annotated with either of these, it's assumed to beDoFn.BoundedPerElement
if itsDoFn.ProcessElement
method returnsvoid
andDoFn.UnboundedPerElement
if it returns aDoFn.ProcessContinuation
. - Timers and state must not be used.
If this DoFn is splittable, this method must satisfy the following constraints:
- One of its arguments must be a
RestrictionTracker
. The argument must be of the exact typeRestrictionTracker<RestrictionT, PositionT>
. - If one of its arguments is tagged with the
DoFn.Element
annotation, then it will be passed the current element being processed. The argument type must match the input type of this DoFn exactly, or both types must have equivalent schemas registered. - If one of its arguments is tagged with the
DoFn.Restriction
annotation, then it will be passed the current restriction being processed; the argument must be of typeRestrictionT
. - If one of its arguments is tagged with the
DoFn.Timestamp
annotation, then it will be passed the timestamp of the current element being processed; the argument must be of typeInstant
. - If one of its arguments is of the type
WatermarkEstimator
, then it will be passed the watermark estimator. - If one of its arguments is of the type
ManualWatermarkEstimator
, then it will be passed a watermark estimator that can be updated manually. This parameter can only be supplied if the method annotated withDoFn.GetInitialWatermarkEstimatorState
returns a sub-type ofManualWatermarkEstimator
. - If one of its arguments is a subtype of
BoundedWindow
, then it will be passed the window of the current element. When applied byParDo
the subtype ofBoundedWindow
must match the type of windows on the inputPCollection
. If the window is not accessed a runner may perform additional optimizations. - If one of its arguments is of type
PaneInfo
, then it will be passed information about the current triggering pane. - If one of the parameters is of type
PipelineOptions
, then it will be passed the options for the current pipeline. - If one of the parameters is of type
DoFn.OutputReceiver
, then it will be passed an output receiver for outputting elements to the default output. - If one of the parameters is of type
DoFn.MultiOutputReceiver
, then it will be passed an output receiver for outputting to multiple tagged outputs. - If one of the parameters is of type
DoFn.BundleFinalizer
, then it will be passed a mechanism to register a callback that will be invoked after the runner successfully commits the output of this bundle. See Apache Beam Portability API: How to Finalize Bundles for further details. - May return a
DoFn.ProcessContinuation
to indicate whether there is more work to be done for the current element, otherwise must returnvoid
.
- If one of its arguments is tagged with the