T - Type of elements read by the source.public abstract class Source<T>
extends java.lang.Object
implements java.io.Serializable
Source for reading the input.
To use this class for supporting your custom input type, derive your class
class from it, and override the abstract methods. Also override either
createWindowedReader(com.google.cloud.dataflow.sdk.options.PipelineOptions, com.google.cloud.dataflow.sdk.coders.Coder<com.google.cloud.dataflow.sdk.util.WindowedValue<T>>, com.google.cloud.dataflow.sdk.util.ExecutionContext) if your source supports timestamps and windows,
or createBasicReader(com.google.cloud.dataflow.sdk.options.PipelineOptions, com.google.cloud.dataflow.sdk.coders.Coder<T>, com.google.cloud.dataflow.sdk.util.ExecutionContext) otherwise. For an example, see DatastoreIO.
A Source passed to a Read transform must be
Serializable. This allows the Source instance
created in this "main program" to be sent (in serialized form) to
remote worker machines and reconstituted for each batch of elements
of the input PCollection being processed or for each source splitting
operation. A Source can have instance variable state, and
non-transient instance variable state will be serialized in the main program
and then deserialized on remote worker machines.
This API is experimental and subject to change.
| Modifier and Type | Class and Description |
|---|---|
static interface |
Source.Reader<T>
The interface which readers of custom input sources must implement.
|
| Constructor and Description |
|---|
Source() |
| Modifier and Type | Method and Description |
|---|---|
protected Source.Reader<T> |
createBasicReader(PipelineOptions options,
Coder<T> coder,
com.google.cloud.dataflow.sdk.util.ExecutionContext executionContext)
Creates a basic (non-windowed) reader for this source.
|
Source.Reader<com.google.cloud.dataflow.sdk.util.WindowedValue<T>> |
createWindowedReader(PipelineOptions options,
Coder<com.google.cloud.dataflow.sdk.util.WindowedValue<T>> coder,
com.google.cloud.dataflow.sdk.util.ExecutionContext executionContext)
Creates a windowed reader for this source.
|
abstract Coder<T> |
getDefaultOutputCoder()
Returns the default
Coder to use for the data read from this source. |
abstract long |
getEstimatedSizeBytes(PipelineOptions options)
An estimate of the total size (in bytes) of the data that would be read from this source.
|
abstract boolean |
producesSortedKeys(PipelineOptions options)
Whether this source is known to produce key/value pairs with the (encoded) keys in
lexicographically sorted order.
|
abstract java.util.List<? extends Source<T>> |
splitIntoShards(long desiredShardSizeBytes,
PipelineOptions options)
Splits the source into shards.
|
abstract void |
validate()
Checks that this source is valid, before it can be used into a pipeline.
|
public abstract java.util.List<? extends Source<T>> splitIntoShards(long desiredShardSizeBytes, PipelineOptions options) throws java.lang.Exception
PipelineOptions can be used to get information such as
credentials for accessing an external storage.
java.lang.Exceptionpublic abstract long getEstimatedSizeBytes(PipelineOptions options) throws java.lang.Exception
java.lang.Exceptionpublic abstract boolean producesSortedKeys(PipelineOptions options) throws java.lang.Exception
java.lang.Exceptionpublic Source.Reader<com.google.cloud.dataflow.sdk.util.WindowedValue<T>> createWindowedReader(PipelineOptions options, Coder<com.google.cloud.dataflow.sdk.util.WindowedValue<T>> coder, @Nullable com.google.cloud.dataflow.sdk.util.ExecutionContext executionContext) throws java.io.IOException
createBasicReader(com.google.cloud.dataflow.sdk.options.PipelineOptions, com.google.cloud.dataflow.sdk.coders.Coder<T>, com.google.cloud.dataflow.sdk.util.ExecutionContext). Override this function if your reader supports timestamps
and windows; otherwise, override createBasicReader(com.google.cloud.dataflow.sdk.options.PipelineOptions, com.google.cloud.dataflow.sdk.coders.Coder<T>, com.google.cloud.dataflow.sdk.util.ExecutionContext) instead.java.io.IOExceptionprotected Source.Reader<T> createBasicReader(PipelineOptions options, Coder<T> coder, @Nullable com.google.cloud.dataflow.sdk.util.ExecutionContext executionContext) throws java.io.IOException
java.io.IOExceptionpublic abstract void validate()
Preconditions for implementing
this method.