public static class PubsubIO.Read extends Object
PTransform
that continuously reads from a Cloud Pub/Sub stream and
returns a PCollection
of Strings
containing the items from
the stream.
When running with a PipelineRunner
that only supports bounded
PCollections
(such as DirectPipelineRunner
),
only a bounded portion of the input Pub/Sub stream can be processed. As such, either
PubsubIO.Read.Bound.maxNumRecords(int)
or PubsubIO.Read.Bound.maxReadTime(Duration)
must be set.
Modifier and Type | Class and Description |
---|---|
static class |
PubsubIO.Read.Bound<T>
A
PTransform that reads from a Cloud Pub/Sub source and returns
a unbounded PCollection containing the items from the stream. |
Modifier and Type | Method and Description |
---|---|
static PubsubIO.Read.Bound<String> |
idLabel(String idLabel)
Creates and returns a transform for reading from Cloud Pub/Sub where unique record
identifiers are expected to be provided as Pub/Sub message attributes.
|
static PubsubIO.Read.Bound<String> |
maxNumRecords(int maxNumRecords)
Creates and returns a transform for reading from Cloud Pub/Sub with a maximum number of
records that will be read.
|
static PubsubIO.Read.Bound<String> |
maxReadTime(Duration maxReadTime)
Creates and returns a transform for reading from Cloud Pub/Sub with a maximum number of
duration during which records will be read.
|
static PubsubIO.Read.Bound<String> |
named(String name)
Creates and returns a transform for reading from Cloud Pub/Sub with the specified transform
name.
|
static PubsubIO.Read.Bound<String> |
subscription(String subscription)
Creates and returns a transform for reading from a specific Cloud Pub/Sub subscription.
|
static PubsubIO.Read.Bound<String> |
timestampLabel(String timestampLabel)
Creates and returns a transform reading from Cloud Pub/Sub where record timestamps are
expected to be provided as Pub/Sub message attributes.
|
static PubsubIO.Read.Bound<String> |
topic(String topic)
Creates and returns a transform for reading from a Cloud Pub/Sub topic.
|
static <T> PubsubIO.Read.Bound<T> |
withCoder(Coder<T> coder)
Creates and returns a transform for reading from Cloud Pub/Sub that uses the given
Coder to decode Pub/Sub messages into a value of type T . |
public static PubsubIO.Read.Bound<String> named(String name)
public static PubsubIO.Read.Bound<String> topic(String topic)
subscription(String)
.
See PubsubIO.PubsubTopic.fromPath(String)
for more details on the format
of the topic
string.
Dataflow will start reading data published on this topic from the time the pipeline is started. Any data published on the topic before the pipeline is started will not be read by Dataflow.
public static PubsubIO.Read.Bound<String> subscription(String subscription)
topic(String)
.
See PubsubIO.PubsubSubscription.fromPath(String)
for more details on the format
of the subscription
string.
public static PubsubIO.Read.Bound<String> timestampLabel(String timestampLabel)
timestampLabel
parameter specifies the name of the attribute that contains the timestamp.
The timestamp value is expected to be represented in the attribute as either:
Instant.getMillis()
returns the correct
value for this attribute.
2015-10-29T23:41:41.123Z
. The
sub-second component of the timestamp is optional, and digits beyond the first three
(i.e., time units smaller than milliseconds) will be ignored.
If timestampLabel
is not provided, the system will generate record timestamps
the first time it sees each record. All windowing will be done relative to these timestamps.
By default, windows are emitted based on an estimate of when this source is likely
done producing data for a given timestamp (referred to as the Watermark; see
AfterWatermark
for more details). Any late data will be handled by the trigger
specified with the windowing strategy – by default it will be output immediately.
Note that the system can guarantee that no late data will ever be seen when it assigns
timestamps by arrival time (i.e. timestampLabel
is not provided).
public static PubsubIO.Read.Bound<String> idLabel(String idLabel)
idLabel
parameter specifies the attribute name. The value of the attribute can be any string
that uniquely identifies this record.
If idLabel
is not provided, Dataflow cannot guarantee that no duplicate data will
be delivered on the Pub/Sub stream. In this case, deduplication of the stream will be
strictly best effort.
public static <T> PubsubIO.Read.Bound<T> withCoder(Coder<T> coder)
Coder
to decode Pub/Sub messages into a value of type T
.
By default, uses StringUtf8Coder
, which just
returns the text lines as Java strings.
T
- the type of the decoded elements, and the elements
of the resulting PCollection.public static PubsubIO.Read.Bound<String> maxNumRecords(int maxNumRecords)
PCollection
.
Either this option or maxReadTime(Duration)
must be set in order to create a
bounded source.
public static PubsubIO.Read.Bound<String> maxReadTime(Duration maxReadTime)
PCollection
.
Either this option or maxNumRecords(int)
must be set in order to create a bounded
source.