T - The type of records contained in the block.@Experimental(value=SOURCE_SINK) public static class AvroSource.AvroReader<T> extends BlockBasedSource.BlockBasedReader<T>
BlockBasedSource.BlockBasedReader for reading blocks from Avro files.
An Avro Object Container File consists of a header followed by a 16-bit sync marker and then a sequence of blocks, where each block begins with two encoded longs representing the total number of records in the block and the block's size in bytes, followed by the block's (optionally-encoded) records. Each block is terminated by a 16-bit sync marker.
Here, we consider the sync marker that precedes a block to be its offset, as this allows a reader that begins reading at that offset to detect the sync marker and the beginning of the block.
| Constructor and Description |
|---|
AvroReader(AvroSource<T> source) |
| Modifier and Type | Method and Description |
|---|---|
com.google.cloud.dataflow.sdk.io.AvroSource.AvroBlock<T> |
getCurrentBlock()
Returns the current block (the block that was read by the last successful call to
BlockBasedSource.BlockBasedReader.readNextBlock()). |
long |
getCurrentBlockOffset()
Returns the largest offset such that starting to read from that offset includes the current
block.
|
long |
getCurrentBlockSize()
Returns the size of the current block in bytes as it is represented in the underlying file,
if possible.
|
AvroSource<T> |
getCurrentSource()
Returns a
Source describing the same input that this Reader reads
(including items already read). |
boolean |
readNextBlock()
Read the next block from the input.
|
protected void |
startReading(ReadableByteChannel channel)
Starts reading from the provided channel.
|
getCurrent, getCurrentOffset, getFractionConsumed, isAtSplitPoint, readNextRecordadvanceImpl, close, startImpladvance, splitAtFraction, startgetCurrentTimestamppublic AvroReader(AvroSource<T> source)
public AvroSource<T> getCurrentSource()
Source.ReaderSource describing the same input that this Reader reads
(including items already read).
A reader created from the result of getCurrentSource, if consumed, MUST
return the same data items as the current reader.
getCurrentSource in class FileBasedSource.FileBasedReader<T>public boolean readNextBlock()
throws IOException
BlockBasedSource.BlockBasedReaderreadNextBlock in class BlockBasedSource.BlockBasedReader<T>IOExceptionpublic com.google.cloud.dataflow.sdk.io.AvroSource.AvroBlock<T> getCurrentBlock()
BlockBasedSource.BlockBasedReaderBlockBasedSource.BlockBasedReader.readNextBlock()). May return null initially, or if no block has been
successfully read.getCurrentBlock in class BlockBasedSource.BlockBasedReader<T>public long getCurrentBlockOffset()
BlockBasedSource.BlockBasedReadergetCurrentBlockOffset in class BlockBasedSource.BlockBasedReader<T>public long getCurrentBlockSize()
BlockBasedSource.BlockBasedReaderThe size returned by this method must be such that for two successive blocks A and B,
offset(A) + size(A) <= offset(B). If this is not satisfied, the progress reported
by the BlockBasedReader will be non-monotonic and will interfere with the quality
(but not correctness) of dynamic work rebalancing.
This method and BlockBasedSource.Block.getFractionOfBlockConsumed() are used to provide an estimate
of progress within a block (getCurrentBlock().getFractionOfBlockConsumed() *
getCurrentBlockSize()). It is acceptable for the result of this computation to be 0, but
progress estimation will be inaccurate.
getCurrentBlockSize in class BlockBasedSource.BlockBasedReader<T>protected void startReading(ReadableByteChannel channel) throws IOException
startReading in class FileBasedSource.FileBasedReader<T>channel - a byte channel representing the file backing the reader.IOException