Package org.apache.beam.sdk.transforms
Annotation Type DoFn.GetSize
-
@Documented @Retention(RUNTIME) @Target(METHOD) public static @interface DoFn.GetSize
Annotation for the method that returns the corresponding size for an element and restriction pair.Signature:
double getSize(<arguments>);
This method must satisfy the following constraints:
- If one of its arguments is tagged with the
DoFn.Element
annotation, then it will be passed the current element being processed; the argument must be of typeInputT
. Note that automatic conversion ofRow
s andDoFn.FieldAccess
parameters are currently unsupported. - If one of its arguments is tagged with the
DoFn.Restriction
annotation, then it will be passed the current restriction being processed; the argument must be of typeRestrictionT
. - If one of its arguments is tagged with the
DoFn.Timestamp
annotation, then it will be passed the timestamp of the current element being processed; the argument must be of typeInstant
. - If one of its arguments is a
RestrictionTracker
, then it will be passed a tracker that is initialized for the currentDoFn.Restriction
. The argument must be of the exact typeRestrictionTracker<RestrictionT, PositionT>
. - If one of its arguments is a subtype of
BoundedWindow
, then it will be passed the window of the current element. When applied byParDo
the subtype ofBoundedWindow
must match the type of windows on the inputPCollection
. If the window is not accessed a runner may perform additional optimizations. - If one of its arguments is of type
PaneInfo
, then it will be passed information about the current triggering pane. - If one of the parameters is of type
PipelineOptions
, then it will be passed the options for the current pipeline.
Returns a non-negative double representing the size of the current element and restriction.
Splittable
DoFn
s should only provide this method if the defaultRestrictionTracker.HasProgress
implementation within theRestrictionTracker
is an inaccurate representation of known work.It is up to each splittable to convert between their natural representation of outstanding work and this representation. For example:
- Block based file source (e.g. Avro): The number of bytes that will be read from the file.
- Pull based queue based source (e.g. Pubsub): The local/global size available in number of
messages or number of
message bytes
that have not been processed. - Key range based source (e.g. Shuffle, Bigtable, ...): Typically
1.0
unless additional details such as the number of bytes for keys and values is known for the key range.
Must be thread safe. Will be invoked concurrently during bundle processing due to runner initiated splitting and progress estimation.
- If one of its arguments is tagged with the