Class BlockBasedSource.BlockBasedReader<T>
- java.lang.Object
-
- org.apache.beam.sdk.io.Source.Reader<T>
-
- org.apache.beam.sdk.io.BoundedSource.BoundedReader<T>
-
- org.apache.beam.sdk.io.OffsetBasedSource.OffsetBasedReader<T>
-
- org.apache.beam.sdk.io.FileBasedSource.FileBasedReader<T>
-
- org.apache.beam.sdk.io.BlockBasedSource.BlockBasedReader<T>
-
- All Implemented Interfaces:
java.lang.AutoCloseable
- Direct Known Subclasses:
AvroSource.AvroReader
- Enclosing class:
- BlockBasedSource<T>
@Experimental(SOURCE_SINK) protected abstract static class BlockBasedSource.BlockBasedReader<T> extends FileBasedSource.FileBasedReader<T>
AReader
that reads records from aBlockBasedSource
. If the source is a subrange of a file, the blocks that will be read by this reader are those such that the first byte of the block is within the range[start, end)
.
-
-
Field Summary
-
Fields inherited from class org.apache.beam.sdk.io.BoundedSource.BoundedReader
SPLIT_POINTS_UNKNOWN
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
BlockBasedReader(BlockBasedSource<T> source)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description T
getCurrent()
Returns the value of the data item that was read by the lastSource.Reader.start()
orSource.Reader.advance()
call.abstract @Nullable BlockBasedSource.Block<T>
getCurrentBlock()
Returns the current block (the block that was read by the last successful call toreadNextBlock()
).abstract long
getCurrentBlockOffset()
Returns the largest offset such that starting to read from that offset includes the current block.abstract long
getCurrentBlockSize()
Returns the size of the current block in bytes as it is represented in the underlying file, if possible.protected long
getCurrentOffset()
Returns the starting offset of thecurrent record
, which has been read by the last successfulSource.Reader.start()
orSource.Reader.advance()
call.@Nullable java.lang.Double
getFractionConsumed()
Returns a value in [0, 1] representing approximately what fraction of thecurrent source
this reader has read so far, ornull
if such an estimate is not available.protected boolean
isAtSplitPoint()
Returns true if the reader is at a split point.abstract boolean
readNextBlock()
Read the next block from the input.protected boolean
readNextRecord()
Reads the next record from thecurrent block
if possible.-
Methods inherited from class org.apache.beam.sdk.io.FileBasedSource.FileBasedReader
advanceImpl, allowsDynamicSplitting, close, getCurrentSource, startImpl, startReading
-
Methods inherited from class org.apache.beam.sdk.io.OffsetBasedSource.OffsetBasedReader
advance, getSplitPointsConsumed, getSplitPointsRemaining, isDone, isStarted, splitAtFraction, start
-
Methods inherited from class org.apache.beam.sdk.io.BoundedSource.BoundedReader
getCurrentTimestamp
-
-
-
-
Constructor Detail
-
BlockBasedReader
protected BlockBasedReader(BlockBasedSource<T> source)
-
-
Method Detail
-
readNextBlock
public abstract boolean readNextBlock() throws java.io.IOException
Read the next block from the input.- Throws:
java.io.IOException
-
getCurrentBlock
public abstract @Nullable BlockBasedSource.Block<T> getCurrentBlock()
Returns the current block (the block that was read by the last successful call toreadNextBlock()
). May return null initially, or if no block has been successfully read.
-
getCurrentBlockSize
public abstract long getCurrentBlockSize()
Returns the size of the current block in bytes as it is represented in the underlying file, if possible. This method may return0
if the size of the current block is unknown.The size returned by this method must be such that for two successive blocks A and B,
offset(A) + size(A) <= offset(B)
. If this is not satisfied, the progress reported by theBlockBasedReader
will be non-monotonic and will interfere with the quality (but not correctness) of dynamic work rebalancing.This method and
BlockBasedSource.Block.getFractionOfBlockConsumed()
are used to provide an estimate of progress within a block (getCurrentBlock().getFractionOfBlockConsumed() * getCurrentBlockSize()
). It is acceptable for the result of this computation to be0
, but progress estimation will be inaccurate.
-
getCurrentBlockOffset
public abstract long getCurrentBlockOffset()
Returns the largest offset such that starting to read from that offset includes the current block.
-
getCurrent
public final T getCurrent() throws java.util.NoSuchElementException
Description copied from class:Source.Reader
Returns the value of the data item that was read by the lastSource.Reader.start()
orSource.Reader.advance()
call. The returned value must be effectively immutable and remain valid indefinitely.Multiple calls to this method without an intervening call to
Source.Reader.advance()
should return the same result.- Specified by:
getCurrent
in classSource.Reader<T>
- Throws:
java.util.NoSuchElementException
- ifSource.Reader.start()
was never called, or if the lastSource.Reader.start()
orSource.Reader.advance()
returnedfalse
.
-
isAtSplitPoint
protected boolean isAtSplitPoint()
Returns true if the reader is at a split point. ABlockBasedReader
is at a split point if the current record is the first record in a block. In other words, split points are block boundaries.- Overrides:
isAtSplitPoint
in classOffsetBasedSource.OffsetBasedReader<T>
-
readNextRecord
protected final boolean readNextRecord() throws java.io.IOException
Reads the next record from thecurrent block
if possible. Will callreadNextBlock()
to advance to the next block if not.The first record read from a block is treated as a split point.
- Specified by:
readNextRecord
in classFileBasedSource.FileBasedReader<T>
- Returns:
true
if a record was successfully read,false
if the end of the channel was reached before successfully reading a new record.- Throws:
java.io.IOException
-
getFractionConsumed
public @Nullable java.lang.Double getFractionConsumed()
Description copied from class:BoundedSource.BoundedReader
Returns a value in [0, 1] representing approximately what fraction of thecurrent source
this reader has read so far, ornull
if such an estimate is not available.It is recommended that this method should satisfy the following properties:
- Should return 0 before the
Source.Reader.start()
call. - Should return 1 after a
Source.Reader.start()
orSource.Reader.advance()
call that returns false. - The returned values should be non-decreasing (though they don't have to be unique).
By default, returns null to indicate that this cannot be estimated.
Thread safety
IfBoundedSource.BoundedReader.splitAtFraction(double)
is implemented, this method can be called concurrently to other methods (including itself), and it is therefore critical for it to be implemented in a thread-safe way.- Overrides:
getFractionConsumed
in classOffsetBasedSource.OffsetBasedReader<T>
- Should return 0 before the
-
getCurrentOffset
protected long getCurrentOffset()
Description copied from class:OffsetBasedSource.OffsetBasedReader
Returns the starting offset of thecurrent record
, which has been read by the last successfulSource.Reader.start()
orSource.Reader.advance()
call.If no such call has been made yet, the return value is unspecified.
See
RangeTracker
for description of offset semantics.- Specified by:
getCurrentOffset
in classOffsetBasedSource.OffsetBasedReader<T>
-
-