Class RestrictionTracker<RestrictionT,PositionT>
- java.lang.Object
-
- org.apache.beam.sdk.transforms.splittabledofn.RestrictionTracker<RestrictionT,PositionT>
-
- Direct Known Subclasses:
ByteKeyRangeTracker
,OffsetRangeTracker
,PeriodicSequence.OutputRangeTracker
public abstract class RestrictionTracker<RestrictionT,PositionT> extends java.lang.Object
Manages access to the restriction and keeps track of its claimed part for a splittableDoFn
.The restriction may be modified by different threads, however the system will ensure sufficient locking such that no methods on the restriction tracker will be called concurrently.
RestrictionTracker
s should implementRestrictionTracker.HasProgress
otherwise poor auto-scaling of workers and/or splitting may result if the progress is an inaccurate representation of the known amount of completed and remaining work.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interface
RestrictionTracker.HasProgress
AllRestrictionTracker
s SHOULD implement this interface to improve auto-scaling and splitting performance.static class
RestrictionTracker.IsBounded
static class
RestrictionTracker.Progress
A representation for the amount of known completed and remaining work.static class
RestrictionTracker.TruncateResult<RestrictionT>
A representation of the truncate result.
-
Constructor Summary
Constructors Constructor Description RestrictionTracker()
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description abstract void
checkDone()
Checks whether the restriction has been fully processed.abstract RestrictionT
currentRestriction()
Returns a restriction accurately describing the full range of work the currentDoFn.ProcessElement
call will do, including already completed work.abstract RestrictionTracker.IsBounded
isBounded()
Return the boundedness of the current restriction.abstract boolean
tryClaim(PositionT position)
Attempts to claim the block of work in the current restriction identified by the given position.abstract @Nullable SplitResult<RestrictionT>
trySplit(double fractionOfRemainder)
Splits current restriction based onfractionOfRemainder
.
-
-
-
Method Detail
-
tryClaim
public abstract boolean tryClaim(PositionT position)
Attempts to claim the block of work in the current restriction identified by the given position. Each claimed position MUST be a valid split point.If this succeeds, the DoFn MUST execute the entire block of work. If this fails:
DoFn.ProcessElement
MUST returnDoFn.ProcessContinuation.stop()
without performing any additional work or emitting output (note that emitting output or performing work fromDoFn.ProcessElement
is also not allowed before the first call to this method).checkDone()
MUST succeed.
-
currentRestriction
public abstract RestrictionT currentRestriction()
Returns a restriction accurately describing the full range of work the currentDoFn.ProcessElement
call will do, including already completed work.The current restriction returned by method may be updated dynamically due to due to concurrent invocation of other methods of the
RestrictionTracker
, For example,trySplit(double)
.This method is required to be implemented.
-
trySplit
public abstract @Nullable SplitResult<RestrictionT> trySplit(double fractionOfRemainder)
Splits current restriction based onfractionOfRemainder
.If splitting the current restriction is possible, the current restriction is split into a primary and residual restriction pair. This invocation updates the
currentRestriction()
to be the primary restriction effectively having the currentDoFn.ProcessElement
execution responsible for performing the work that the primary restriction represents. The residual restriction will be executed in a separateDoFn.ProcessElement
invocation (likely in a different process). The work performed by executing the primary and residual restrictions as separateDoFn.ProcessElement
invocations MUST be equivalent to the work performed as if this split never occurred.The
fractionOfRemainder
should be used in a best effort manner to choose a primary and residual restriction based upon the fraction of the remaining work that the currentDoFn.ProcessElement
invocation is responsible for. For example, if aDoFn.ProcessElement
was reading a file with a restriction representing the offset range[100, 200)
and has processed up to offset 130 with afractionOfRemainder
of0.7
, the primary and residual restrictions returned would be[100, 179), [179, 200)
(note:currentOffset + fractionOfRemainder * remainingWork = 130 + 0.7 * 70 = 179
).fractionOfRemainder = 0
means a checkpoint is required.The API is recommended to be implemented for a batch pipeline to improve parallel processing performance.
The API is recommended to be implemented for batch pipeline given that it is very important for pipeline scaling and end to end pipeline execution.
The API is required to be implemented for a streaming pipeline.
- Parameters:
fractionOfRemainder
- A hint as to the fraction of work the primary restriction should represent based upon the current known remaining amount of work.- Returns:
- a
SplitResult
if a split was possible, otherwise returnsnull
. If thefractionOfRemainder == 0
, anull
result MUST imply that the restriction tracker is done and there is no more work left to do.
-
checkDone
public abstract void checkDone() throws java.lang.IllegalStateException
Checks whether the restriction has been fully processed.Called by the SDK harness after
DoFn.ProcessElement
returns.Must throw an exception with an informative error message, if there is still any unclaimed work remaining in the restriction.
This method is required to be implemented in order to prevent data loss during SDK processing.
- Throws:
java.lang.IllegalStateException
-
isBounded
public abstract RestrictionTracker.IsBounded isBounded()
Return the boundedness of the current restriction. If the current restriction represents a finite amount of work, it should returnRestrictionTracker.IsBounded.BOUNDED
. Otherwise, it should returnRestrictionTracker.IsBounded.UNBOUNDED
.It is valid to return
RestrictionTracker.IsBounded.BOUNDED
after returningRestrictionTracker.IsBounded.UNBOUNDED
once the end of a restriction is discovered. It is not valid to returnRestrictionTracker.IsBounded.UNBOUNDED
after returningRestrictionTracker.IsBounded.BOUNDED
.This method is required to be implemented.
-
-