@Experimental(value=SOURCE_SINK) public static interface UnboundedSource.UnboundedReader<OutputT> extends Source.Reader<OutputT>
Reader that reads an unbounded amount of input.
A given UnboundedReader object will only be accessed by a single thread at once.
| Modifier and Type | Method and Description |
|---|---|
boolean |
advance()
Advances the reader to the next valid record.
|
UnboundedSource.CheckpointMark |
getCheckpointMark()
Returns a
UnboundedSource.CheckpointMark representing the progress of this UnboundedReader. |
byte[] |
getCurrentRecordId()
Returns a unique identifier for the current record.
|
UnboundedSource<OutputT,?> |
getCurrentSource()
Returns the
UnboundedSource that created this reader. |
org.joda.time.Instant |
getWatermark()
Returns a lower bound on timestamps of future elements read by this reader.
|
boolean |
start()
Initializes the reader and advances the reader to the first record.
|
close, getCurrent, getCurrentTimestampboolean start()
throws IOException
This method should be called exactly once. The invocation should occur prior to calling
advance() or Source.Reader.getCurrent(). This method may perform expensive operations that
are needed to initialize the reader.
Returns true if a record was read, false if there is no more input
currently available. Future calls to advance() may return true once more data
is available. Regardless of the return value of start, start will not be
called again on the same UnboundedReader object; it will only be called again when a
new reader object is constructed for the same source, e.g. on recovery.
start in interface Source.Reader<OutputT>true if a record was read, false if there is no more input available.IOExceptionboolean advance()
throws IOException
Returns true if a record was read, false if there is no more input
available. Future calls to advance() may return true once more data is
available.
advance in interface Source.Reader<OutputT>true if a record was read, false if there is no more input available.IOExceptionbyte[] getCurrentRecordId()
throws NoSuchElementException
For example, this could be a hash of the record contents, or a logical ID present in the record. If this is generated as a hash of the record contents, it should be at least 16 bytes (128 bits) to avoid collisions.
This method has the same restrictions on when it can be called as Source.Reader.getCurrent() and
Source.Reader.getCurrentTimestamp().
Note: this is not yet supported by the DataflowPipelineRunner, and it will be ignored.
NoSuchElementException - if the reader is at the beginning of the input and
start() or advance() wasn't called, or if the last start() or
advance() returned false.org.joda.time.Instant getWatermark()
This can be approximate. If records are read that violate this guarantee, they will be
considered late, which will affect how they will be processed. See
Window for more information on
late data and how to handle it.
This bound should be as tight as possible. Downstream windows will not be able to close until this watermark passes the end of the window.
For example, a source may know that the records it reads will be in timestamp order. In this case, the watermark can be the timestamp of the last record read minus one. For a source that does not have natural timestamps, timestamps can be set to the time of reading, in which case the watermark is the current clock time.
See Window and
Trigger for more
information on timestamps and watermarks.
May be called after advance() or start() has returned false, but not before
start() has been called.
UnboundedSource.CheckpointMark getCheckpointMark()
UnboundedSource.CheckpointMark representing the progress of this UnboundedReader.
The elements read up until this is called will be processed together as a bundle. Once
the result of this processing has been durably committed,
UnboundedSource.CheckpointMark.finalizeCheckpoint() will be called on the UnboundedSource.CheckpointMark
object.
The returned object should not be modified.
May be called after advance() or start() has returned false, but not before
start() has been called.
UnboundedSource<OutputT,?> getCurrentSource()
UnboundedSource that created this reader. This will not change over the
life of the reader.getCurrentSource in interface Source.Reader<OutputT>