public abstract static class OffsetBasedSource.OffsetBasedReader<T> extends BoundedSource.BoundedReader<T>
Source.Reader that implements code common to readers of all
OffsetBasedSources.
Subclasses have to implement:
startImpl() and advanceImpl() for reading the
first or subsequent records.
Source.Reader.getCurrent(), getCurrentOffset(), and optionally
isAtSplitPoint() and BoundedSource.BoundedReader.getCurrentTimestamp() to access properties of
the last record successfully read by startImpl() or advanceImpl().
| Constructor and Description |
|---|
OffsetBasedReader(OffsetBasedSource<T> source) |
| Modifier and Type | Method and Description |
|---|---|
boolean |
advance()
Advances the reader to the next valid record.
|
protected abstract boolean |
advanceImpl()
Advances to the next record and returns
true, or returns false if there is no next
record. |
protected abstract long |
getCurrentOffset()
Returns the starting offset of the
current record,
which has been read by the last successful Source.Reader.start() or
Source.Reader.advance() call. |
OffsetBasedSource<T> |
getCurrentSource()
Returns a
Source describing the same input that this Reader currently reads
(including items already read). |
Double |
getFractionConsumed()
Returns a value in [0, 1] representing approximately what fraction of the
current source this reader has read so far, or null if such
an estimate is not available. |
protected boolean |
isAtSplitPoint()
Returns whether the current record is at a split point (i.e., whether the current record
would be the first record to be read by a source with a specified start offset of
getCurrentOffset()). |
OffsetBasedSource<T> |
splitAtFraction(double fraction)
Tells the reader to narrow the range of the input it's going to read and give up
the remainder, so that the new range would contain approximately the given
fraction of the amount of data in the current range.
|
boolean |
start()
Initializes the reader and advances the reader to the first record.
|
protected abstract boolean |
startImpl()
Initializes the
OffsetBasedSource.OffsetBasedReader and advances to the first record,
returning true if there is a record available to be read. |
getCurrentTimestampclose, getCurrentpublic OffsetBasedReader(OffsetBasedSource<T> source)
source - the OffsetBasedSource to be read by the current reader.protected abstract long getCurrentOffset()
throws NoSuchElementException
current record,
which has been read by the last successful Source.Reader.start() or
Source.Reader.advance() call.
If no such call has been made yet, the return value is unspecified.
See RangeTracker for description of offset semantics.
NoSuchElementExceptionprotected boolean isAtSplitPoint()
throws NoSuchElementException
getCurrentOffset()).
See detailed documentation about split points in RangeTracker.
NoSuchElementExceptionpublic final boolean start()
throws IOException
Source.ReaderThis method should be called exactly once. The invocation should occur prior to calling
Source.Reader.advance() or Source.Reader.getCurrent(). This method may perform expensive operations that
are needed to initialize the reader.
start in class Source.Reader<T>true if a record was read, false if there is no more input available.IOExceptionpublic final boolean advance()
throws IOException
Source.ReaderIt is an error to call this without having called Source.Reader.start() first.
advance in class Source.Reader<T>true if a record was read, false if there is no more input available.IOExceptionprotected abstract boolean startImpl()
throws IOException
OffsetBasedSource.OffsetBasedReader and advances to the first record,
returning true if there is a record available to be read. This method will be
invoked exactly once and may perform expensive setup operations that are needed to
initialize the reader.
This function is the OffsetBasedReader implementation of
BoundedReader#start. The key difference is that the implementor can ignore the
possibility that it should no longer produce the first record, either because it has exceeded
the original endOffset assigned to the reader, or because a concurrent call to
splitAtFraction(double) has changed the source to shrink the offset range being read.
IOExceptionBoundedReader#startprotected abstract boolean advanceImpl()
throws IOException
true, or returns false if there is no next
record.
This function is the OffsetBasedReader implementation of
BoundedReader#advance. The key difference is that the implementor can ignore the
possibility that it should no longer produce the next record, either because it has exceeded
the original endOffset assigned to the reader, or because a concurrent call to
splitAtFraction(double) has changed the source to shrink the offset range being read.
IOExceptionBoundedReader#advancepublic OffsetBasedSource<T> getCurrentSource()
BoundedSource.BoundedReaderSource describing the same input that this Reader currently reads
(including items already read).
Reader subclasses can use this method for convenience to access unchanging properties of the source being read. Alternatively, they can cache these properties in the constructor.
The framework will call this method in the course of dynamic work rebalancing, e.g. after
a successful BoundedSource.BoundedReader.splitAtFraction(double) call.
Source objects must always be immutable. However, the return value of
this function may be affected by dynamic work rebalancing, happening asynchronously via
BoundedSource.BoundedReader.splitAtFraction(double), meaning it can return a different
Source object. However, the returned object itself will still itself be immutable.
Callers must take care not to rely on properties of the returned source that may be
asynchronously changed as a result of this process (e.g. do not cache an end offset when
reading a file).
Source possible.
In practice, the implementation of this method should nearly always be one of the following:
BoundedSource.BoundedReader.getCurrentSource(): delegate to base class. In this case, it is almost always
an error for the subclass to maintain its own copy of the source.
public FooReader(FooSource<T> source) {
super(source);
}
public FooSource<T> getCurrentSource() {
return (FooSource<T>)super.getCurrentSource();
}
private final FooSource<T> source;
public FooReader(FooSource<T> source) {
this.source = source;
}
public FooSource<T> getCurrentSource() {
return source;
}
BoundedSource.BoundedReader that explicitly supports dynamic work rebalancing:
maintain a variable pointing to an immutable source object, and protect it with
synchronization.
private FooSource<T> source;
public FooReader(FooSource<T> source) {
this.source = source;
}
public synchronized FooSource<T> getCurrentSource() {
return source;
}
public synchronized FooSource<T> splitAtFraction(double fraction) {
...
FooSource<T> primary = ...;
FooSource<T> residual = ...;
this.source = primary;
return residual;
}
getCurrentSource in class BoundedSource.BoundedReader<T>public Double getFractionConsumed()
BoundedSource.BoundedReadercurrent source this reader has read so far, or null if such
an estimate is not available.
It is recommended that this method should satisfy the following properties:
Source.Reader.start() call.
Source.Reader.start() or Source.Reader.advance() call that returns false.
By default, returns null to indicate that this cannot be estimated.
BoundedSource.BoundedReader.splitAtFraction(double) is implemented, this method can be called concurrently to other
methods (including itself), and it is therefore critical for it to be implemented
in a thread-safe way.getFractionConsumed in class BoundedSource.BoundedReader<T>public final OffsetBasedSource<T> splitAtFraction(double fraction)
BoundedSource.BoundedReaderReturns a BoundedSource representing the remainder.
BoundedSource<T> initial = reader.getCurrentSource();
BoundedSource<T> residual = reader.splitAtFraction(fraction);
BoundedSource<T> primary = reader.getCurrentSource();
This method should return null if the split cannot be performed for this fraction
while satisfying the semantics above. E.g., a reader that reads a range of offsets
in a file should return null if it is already past the position in its range
corresponding to the given fraction. In this case, the method MUST have no effect
(the reader must behave as if the method hadn't been called at all).
It is also very important that this method always completes quickly. In particular, it should not perform or wait on any blocking operations such as I/O, RPCs etc. Violating this requirement may stall completion of the work item or even cause it to fail.
It is incorrect to make both this method and Source.Reader.start()/Source.Reader.advance()
synchronized, because those methods can perform blocking operations, and then
this method would have to wait for those calls to complete.
RangeTracker makes it easy to implement
this method safely and correctly.
By default, returns null to indicate that splitting is not possible.
splitAtFraction in class BoundedSource.BoundedReader<T>