@Experimental(value=SOURCE_SINK) public abstract static class BoundedSource.BoundedReader<T> extends Source.Reader<T>
Reader that reads a bounded amount of input and supports some additional
operations, such as progress estimation and dynamic work rebalancing.
Once Source.Reader.start() or Source.Reader.advance() has returned false, neither will be called
again on this object.
splitAtFraction(double) and
getFractionConsumed(), which can be called concurrently from a different thread. There
will not be multiple concurrent calls to splitAtFraction(double) but there can be for
getFractionConsumed() if splitAtFraction(double) is implemented.
If the source does not implement splitAtFraction(double), you do not need to worry about
thread safety. If implemented, it must be safe to call splitAtFraction(double) and
getFractionConsumed() concurrently with other methods.
splitAtFraction(double)splitAtFraction(double)
may be called concurrently with Source.Reader.advance() or Source.Reader.start(). It is critical that
their interaction is implemented in a thread-safe way, otherwise data loss is possible.
Sources which support dynamic work rebalancing should use
RangeTracker to manage the (source-specific)
range of positions that is being split. If your source supports dynamic work rebalancing,
please use that class to implement it if possible; if not possible, please contact the team
at [email protected].
| Constructor and Description |
|---|
BoundedReader() |
| Modifier and Type | Method and Description |
|---|---|
abstract BoundedSource<T> |
getCurrentSource()
Returns a
Source describing the same input that this Reader reads
(including items already read). |
org.joda.time.Instant |
getCurrentTimestamp()
By default, returns the minimum possible timestamp.
|
Double |
getFractionConsumed()
Returns a value in [0, 1] representing approximately what fraction of the source
(
getCurrentSource()) this reader has read so far. |
BoundedSource<T> |
splitAtFraction(double fraction)
Tells the reader to narrow the range of the input it's going to read and give up
the remainder, so that the new range would contain approximately the given
fraction of the amount of data in the current range.
|
advance, close, getCurrent, startpublic Double getFractionConsumed()
getCurrentSource()) this reader has read so far.
It is recommended that this method should satisfy the following properties:
Source.Reader.start() call.
Source.Reader.start() or Source.Reader.advance() call that returns false.
By default, returns null to indicate that this cannot be estimated.
splitAtFraction(double) is implemented, this method can be called concurrently to other
methods (including itself), and it is therefore critical for it to be implemented
in a thread-safe way.null if such an estimate is not available.public abstract BoundedSource<T> getCurrentSource()
Source.ReaderSource describing the same input that this Reader reads
(including items already read).
A reader created from the result of getCurrentSource, if consumed, MUST
return the same data items as the current reader.
getCurrentSource in class Source.Reader<T>public BoundedSource<T> splitAtFraction(double fraction)
Returns a BoundedSource representing the remainder.
BoundedSource<T> initial = reader.getCurrentSource();
BoundedSource<T> residual = reader.splitAtFraction(fraction);
BoundedSource<T> primary = reader.getCurrentSource();
This method should return null if the split cannot be performed for this fraction
while satisfying the semantics above. E.g., a reader that reads a range of offsets
in a file should return null if it is already past the position in its range
corresponding to the given fraction. In this case, the method MUST have no effect
(the reader must behave as if the method hadn't been called at all).
It is also very important that this method always completes quickly, in particular, it should not perform or wait on any blocking operations such as I/O, RPCs etc. Violating this requirement may stall completion of the work item or even cause it to fail.
E.g. it is incorrect to make both this method and Source.Reader.start()/Source.Reader.advance()
synchronized, because those methods can perform blocking operations, and then
this method would have to wait for those calls to complete.
RangeTracker makes it easy to implement
this method safely and correctly.
By default, returns null to indicate that splitting is not possible.
public org.joda.time.Instant getCurrentTimestamp()
throws NoSuchElementException
getCurrentTimestamp in class Source.Reader<T>NoSuchElementException - if the reader is at the beginning of the input and
Source.Reader.start() or Source.Reader.advance() wasn't called, or if the last Source.Reader.start() or
Source.Reader.advance() returned false.