Class OffsetRangeTracker
- java.lang.Object
-
- org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<OffsetRange,java.lang.Long>
-
- org.apache.beam.sdk.transforms.splittabledofn.OffsetRangeTracker
-
- All Implemented Interfaces:
RestrictionTracker.HasProgress
- Direct Known Subclasses:
GrowableOffsetRangeTracker
public class OffsetRangeTracker extends RestrictionTracker<OffsetRange,java.lang.Long> implements RestrictionTracker.HasProgress
ARestrictionTracker
for claiming offsets in anOffsetRange
in a monotonically increasing fashion.The smallest offset is
Long.MIN_VALUE
and the largest offset isLong.MAX_VALUE - 1
.
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker
RestrictionTracker.HasProgress, RestrictionTracker.IsBounded, RestrictionTracker.Progress, RestrictionTracker.TruncateResult<RestrictionT>
-
-
Field Summary
Fields Modifier and Type Field Description protected @Nullable java.lang.Long
lastAttemptedOffset
protected @Nullable java.lang.Long
lastClaimedOffset
protected OffsetRange
range
-
Constructor Summary
Constructors Constructor Description OffsetRangeTracker(OffsetRange range)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
checkDone()
Checks whether the restriction has been fully processed.OffsetRange
currentRestriction()
Returns a restriction accurately describing the full range of work the currentDoFn.ProcessElement
call will do, including already completed work.RestrictionTracker.Progress
getProgress()
A representation for the amount of known completed and known remaining work.RestrictionTracker.IsBounded
isBounded()
Return the boundedness of the current restriction.java.lang.String
toString()
boolean
tryClaim(java.lang.Long i)
Attempts to claim the given offset.SplitResult<OffsetRange>
trySplit(double fractionOfRemainder)
Splits current restriction based onfractionOfRemainder
.
-
-
-
Field Detail
-
range
protected OffsetRange range
-
lastClaimedOffset
protected @Nullable java.lang.Long lastClaimedOffset
-
lastAttemptedOffset
protected @Nullable java.lang.Long lastAttemptedOffset
-
-
Constructor Detail
-
OffsetRangeTracker
public OffsetRangeTracker(OffsetRange range)
-
-
Method Detail
-
currentRestriction
public OffsetRange currentRestriction()
Description copied from class:RestrictionTracker
Returns a restriction accurately describing the full range of work the currentDoFn.ProcessElement
call will do, including already completed work.The current restriction returned by method may be updated dynamically due to due to concurrent invocation of other methods of the
RestrictionTracker
, For example,RestrictionTracker.trySplit(double)
.This method is required to be implemented.
- Specified by:
currentRestriction
in classRestrictionTracker<OffsetRange,java.lang.Long>
-
trySplit
public SplitResult<OffsetRange> trySplit(double fractionOfRemainder)
Description copied from class:RestrictionTracker
Splits current restriction based onfractionOfRemainder
.If splitting the current restriction is possible, the current restriction is split into a primary and residual restriction pair. This invocation updates the
RestrictionTracker.currentRestriction()
to be the primary restriction effectively having the currentDoFn.ProcessElement
execution responsible for performing the work that the primary restriction represents. The residual restriction will be executed in a separateDoFn.ProcessElement
invocation (likely in a different process). The work performed by executing the primary and residual restrictions as separateDoFn.ProcessElement
invocations MUST be equivalent to the work performed as if this split never occurred.The
fractionOfRemainder
should be used in a best effort manner to choose a primary and residual restriction based upon the fraction of the remaining work that the currentDoFn.ProcessElement
invocation is responsible for. For example, if aDoFn.ProcessElement
was reading a file with a restriction representing the offset range[100, 200)
and has processed up to offset 130 with afractionOfRemainder
of0.7
, the primary and residual restrictions returned would be[100, 179), [179, 200)
(note:currentOffset + fractionOfRemainder * remainingWork = 130 + 0.7 * 70 = 179
).fractionOfRemainder = 0
means a checkpoint is required.The API is recommended to be implemented for a batch pipeline to improve parallel processing performance.
The API is recommended to be implemented for batch pipeline given that it is very important for pipeline scaling and end to end pipeline execution.
The API is required to be implemented for a streaming pipeline.
- Specified by:
trySplit
in classRestrictionTracker<OffsetRange,java.lang.Long>
- Parameters:
fractionOfRemainder
- A hint as to the fraction of work the primary restriction should represent based upon the current known remaining amount of work.- Returns:
- a
SplitResult
if a split was possible, otherwise returnsnull
. If thefractionOfRemainder == 0
, anull
result MUST imply that the restriction tracker is done and there is no more work left to do.
-
tryClaim
public boolean tryClaim(java.lang.Long i)
Attempts to claim the given offset.Must be larger than the last successfully claimed offset.
- Specified by:
tryClaim
in classRestrictionTracker<OffsetRange,java.lang.Long>
- Returns:
true
if the offset was successfully claimed,false
if it is outside the currentOffsetRange
of this tracker (in that case this operation is a no-op).
-
checkDone
public void checkDone() throws java.lang.IllegalStateException
Description copied from class:RestrictionTracker
Checks whether the restriction has been fully processed.Called by the SDK harness after
DoFn.ProcessElement
returns.Must throw an exception with an informative error message, if there is still any unclaimed work remaining in the restriction.
This method is required to be implemented in order to prevent data loss during SDK processing.
- Specified by:
checkDone
in classRestrictionTracker<OffsetRange,java.lang.Long>
- Throws:
java.lang.IllegalStateException
-
isBounded
public RestrictionTracker.IsBounded isBounded()
Description copied from class:RestrictionTracker
Return the boundedness of the current restriction. If the current restriction represents a finite amount of work, it should returnRestrictionTracker.IsBounded.BOUNDED
. Otherwise, it should returnRestrictionTracker.IsBounded.UNBOUNDED
.It is valid to return
RestrictionTracker.IsBounded.BOUNDED
after returningRestrictionTracker.IsBounded.UNBOUNDED
once the end of a restriction is discovered. It is not valid to returnRestrictionTracker.IsBounded.UNBOUNDED
after returningRestrictionTracker.IsBounded.BOUNDED
.This method is required to be implemented.
- Specified by:
isBounded
in classRestrictionTracker<OffsetRange,java.lang.Long>
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
getProgress
public RestrictionTracker.Progress getProgress()
Description copied from interface:RestrictionTracker.HasProgress
A representation for the amount of known completed and known remaining work.It is up to each restriction tracker to convert between their natural representation of completed and remaining work and the
double
representation. For example:- Block based file source (e.g. Avro): The number of bytes from the beginning of the restriction to the current block and the number of bytes from the current block to the end of the restriction.
- Pull based queue based source (e.g. Pubsub): The local/global size available in number
of messages or number of
message bytes
that have processed and the number of messages or number ofmessage bytes
that are outstanding. - Key range based source (e.g. BigQuery, Bigtable, ...): Scale the start key to be one and end key to be zero and interpolate the position of the next splittable key as a position. If information about the probability density function or cumulative distribution function is available, work completed and work remaining interpolation can be improved. Alternatively, if the number of encoded bytes for the keys and values is known for the key range, the number of completed and remaining bytes can be used.
The work completed and work remaining must be of the same scale whether that be number of messages or number of bytes and should never represent two distinct unit types.
- Specified by:
getProgress
in interfaceRestrictionTracker.HasProgress
-
-