T - Type of elements read by the source.@Experimental(value=SOURCE_SINK) public abstract class Source<T> extends java.lang.Object implements java.io.Serializable
Source for reading the input.
To use this class for supporting your custom input type, derive your class
class from it, and override the abstract methods. For an example, see DatastoreIO.
A Source passed to a Read transform must be
Serializable. This allows the Source instance
created in this "main program" to be sent (in serialized form) to
remote worker machines and reconstituted for each batch of elements
of the input PCollection being processed or for each source splitting
operation. A Source can have instance variable state, and
non-transient instance variable state will be serialized in the main program
and then deserialized on remote worker machines.
Source classes MUST be effectively immutable. The only acceptable use of
mutable fields is to cache the results of expensive operations, and such fields MUST be
marked transient.
Source objects should implement Object.toString(), as it will be
used in important error and debugging messages.
| Modifier and Type | Class and Description |
|---|---|
static class |
Source.AbstractReader<T>
A base class implementing optional methods of
Source.Reader in a default way:
All values have the timestamp of BoundedWindow.TIMESTAMP_MIN_VALUE. |
static interface |
Source.Reader<T>
The interface that readers of custom input sources must implement.
|
| Constructor and Description |
|---|
Source() |
| Modifier and Type | Method and Description |
|---|---|
Source.Reader<T> |
createReader(PipelineOptions options,
com.google.cloud.dataflow.sdk.util.ExecutionContext executionContext)
Creates a reader for this source.
|
abstract Coder<T> |
getDefaultOutputCoder()
Returns the default
Coder to use for the data read from this source. |
abstract java.util.List<? extends Source<T>> |
splitIntoBundles(long desiredBundleSizeBytes,
PipelineOptions options)
Splits the source into bundles.
|
abstract void |
validate()
Checks that this source is valid, before it can be used in a pipeline.
|
public abstract java.util.List<? extends Source<T>> splitIntoBundles(long desiredBundleSizeBytes, PipelineOptions options) throws java.lang.Exception
PipelineOptions can be used to get information such as
credentials for accessing an external storage.
java.lang.Exceptionpublic Source.Reader<T> createReader(PipelineOptions options, @Nullable com.google.cloud.dataflow.sdk.util.ExecutionContext executionContext) throws java.io.IOException
java.io.IOExceptionpublic abstract void validate()
It is recommended to use Preconditions for implementing
this method.