public abstract static class FileBasedSource.FileBasedReader<T> extends ByteOffsetBasedSource.ByteOffsetBasedReader<T>
reader that implements code common to readers of
FileBasedSources.
This reader uses a ReadableByteChannel created for the file represented by the
corresponding source to efficiently move to the correct starting position defined in the
source. Subclasses of this reader should implement startReading(java.nio.channels.ReadableByteChannel) to get access to this
channel. If the source corresponding to the reader is for a subrange of a file the
ReadableByteChannel provided is guaranteed to be an instance of the type
SeekableByteChannel which may be used by subclass to traverse back in the channel to
determine the correct starting position.
Simple record-based formats (such as reading lines, reading CSV etc.), where each record can be identified by a unique offset, should interpret a range [A, B) as "read from the first record starting at or after offset A, up to but not including the first record starting at or after offset B".
More complex formats, such as some block-based formats, may have records which are not directly addressable: i.e. for some records, there is no way to describe the location of a record using a single offset number. For example, imagine a file format consisting of a sequence of blocks, where each block is compressed using some block compression algorithm. Then blocks have offsets, but individual records don't. More complex cases are also possible.
Many such formats still admit reading a range of offsets in a way consistent with the
semantics of ByteOffsetBasedReader, i.e. reading [A, B) and [B, C) is equivalent to
reading [A, C). E.g., for the compressed block-based format discussed above, reading [A, B)
would mean "read all the records in all blocks whose starting offset is in [A, B)".
To support such complex formats in FileBasedReader, we introduce the notion of
split points. We say that a record is a split point if there exists an offset A such
that the record is the first one to be read for a range [A, Long.MAX_VALUE). E.g. for
the block-based format above, the only split points would be the first records in each block.
With the above definition of split points an extended definition of the offset of a record
can be specified. For a record which is at a split point, its offset is defined to be the
largest A such that reading a source with the range [A, Long.MAX_VALUE) includes this record;
offsets of other records are only required to be non-strictly increasing. Offsets of records of
a FileBasedReader should be set based on this definition.
Sequential reading is implemented using readNextRecord().
Then FileBasedReader implements "reading a range [A, B)" in the following way.
start() opens the file
start() seeks the SeekableByteChannel to A (reading offset ranges for
non-seekable files is not supported) and calls startReading()
start() calls advance() once
true sequential reading starts and
advance() will be called repeatedly
advance() calls readNextRecord() on the subclass, and stops (returns false) if
the new record is at a split point AND the offset of the new record is at or after B.
Source.Reader it guarantees thread safety. Abstract methods
defined here will not be accessed by more than one thread concurrently.| Constructor and Description |
|---|
FileBasedSource.FileBasedReader(FileBasedSource<T> source)
Subclasses should not perform IO operations at the constructor.
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
advance()
Advances the iterator to the next valid record.
|
void |
close()
Closes any
ReadableByteChannel created for the current reader. |
protected FileBasedSource<T> |
getSource() |
protected abstract boolean |
isAtSplitPoint()
Specifies if the current record of the reader is at a split point.
|
protected abstract boolean |
readNextRecord()
Reads the next record from the channel provided by
startReading(java.nio.channels.ReadableByteChannel). |
boolean |
start()
Initializes the reader and advances the reader to the first record.
|
protected abstract void |
startReading(java.nio.channels.ReadableByteChannel channel)
Performs any initialization of the subclass of
FileBasedReader that involves IO
operations. |
getCurrentOffsetclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitgetCurrentpublic FileBasedSource.FileBasedReader(FileBasedSource<T> source)
startReading(java.nio.channels.ReadableByteChannel) method is invoked.protected final FileBasedSource<T> getSource()
public final boolean start()
throws java.io.IOException
Source.Reader This method should be called exactly once. The invocation should occur prior to calling
Source.Reader.advance() or Source.Reader.getCurrent(). This method may perform expensive operations that
are needed to initialize the reader.
true if a record was read, false if we're at the end of input.java.io.IOExceptionpublic final boolean advance()
throws java.io.IOException
Source.ReaderSource.Reader.getCurrent() call.true if a record was read, false if we're at the end of input.java.io.IOExceptionpublic void close()
throws java.io.IOException
ReadableByteChannel created for the current reader. This implementation is
idempotent. Any close() method introduced by a subclass must be idempotent and must
call the close() method in the FileBasedReader.java.io.IOExceptionprotected abstract boolean isAtSplitPoint()
This returns true if readNextRecord() was invoked at least once and the
last record returned by readNextRecord() is at a split point, false otherwise.
Please refer to FileBasedReader for the definition of
split points.
protected abstract void startReading(java.nio.channels.ReadableByteChannel channel)
throws java.io.IOException
FileBasedReader that involves IO
operations. Will only be invoked once and before that invocation the base class will seek the
channel to the source's starting offset.
Provided ReadableByteChannel is for the file represented by the source of this
reader. Subclass may use the channel to build a higher level IO abstraction, e.g., a
BufferedReader or an XML parser.
A subclass may additionally use this to adjust the starting position prior to reading
records. For example, the channel of a reader that reads text lines may point to the middle
of a line after the position adjustment done at FileBasedReader. In this case the
subclass could adjust the position of the channel to the beginning of the next line. If the
corresponding source is for a subrange of a file, channel is guaranteed to be an
instance of the type SeekableByteChannel in which case the subclass may traverse back
in the channel to determine if the channel is already at the correct starting position (e.g.,
to check if the previous character was a newline).
After this method is invoked the base class will not be reading data from the channel or adjusting the position of the channel. But the base class is responsible for properly closing the channel.
channel - a byte channel representing the file backing the reader.java.io.IOExceptionprotected abstract boolean readNextRecord()
throws java.io.IOException
startReading(java.nio.channels.ReadableByteChannel). Methods
Source.Reader.getCurrent(), ByteOffsetBasedSource.ByteOffsetBasedReader.getCurrentOffset(), and isAtSplitPoint() should return
the corresponding information about the record read by the last invocation of this method.true if a record was successfully read, false if the end of the
channel was reached before successfully reading a new record.java.io.IOException