public abstract static class PubsubIO.Read<T>
extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>
Constructor and Description |
---|
Read() |
Modifier and Type | Method and Description |
---|---|
org.apache.beam.sdk.values.PCollection<T> |
expand(org.apache.beam.sdk.values.PBegin input) |
PubsubIO.Read<T> |
fromSubscription(java.lang.String subscription)
Reads from the given subscription.
|
PubsubIO.Read<T> |
fromSubscription(org.apache.beam.sdk.options.ValueProvider<java.lang.String> subscription)
Like
subscription() but with a ValueProvider . |
PubsubIO.Read<T> |
fromTopic(java.lang.String topic)
Creates and returns a transform for reading from a Cloud Pub/Sub topic.
|
PubsubIO.Read<T> |
fromTopic(org.apache.beam.sdk.options.ValueProvider<java.lang.String> topic)
Like
topic() but with a ValueProvider . |
void |
populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder) |
PubsubIO.Read<T> |
withClientFactory(PubsubClient.PubsubClientFactory factory)
The default client to write to Pub/Sub is the
PubsubJsonClient , created by the PubsubJsonClient.PubsubJsonClientFactory . |
PubsubIO.Read<T> |
withCoderAndParseFn(org.apache.beam.sdk.coders.Coder<T> coder,
org.apache.beam.sdk.transforms.SimpleFunction<PubsubMessage,T> parseFn)
Causes the source to return a PubsubMessage that includes Pubsub attributes, and uses the
given parsing function to transform the PubsubMessage into an output type.
|
PubsubIO.Read<T> |
withIdAttribute(java.lang.String idAttribute)
When reading from Cloud Pub/Sub where unique record identifiers are provided as Pub/Sub
message attributes, specifies the name of the attribute containing the unique identifier.
|
PubsubIO.Read<T> |
withTimestampAttribute(java.lang.String timestampAttribute)
When reading from Cloud Pub/Sub where record timestamps are provided as Pub/Sub message
attributes, specifies the name of the attribute that contains the timestamp.
|
public PubsubIO.Read<T> fromSubscription(java.lang.String subscription)
See PubsubIO.PubsubSubscription.fromPath(String)
for more details on the format of
the subscription
string.
Multiple readers reading from the same subscription will each receive some arbitrary portion of the data. Most likely, separate readers should use their own subscriptions.
public PubsubIO.Read<T> fromSubscription(org.apache.beam.sdk.options.ValueProvider<java.lang.String> subscription)
subscription()
but with a ValueProvider
.public PubsubIO.Read<T> fromTopic(java.lang.String topic)
fromSubscription(String)
.
See PubsubIO.PubsubTopic.fromPath(String)
for more details on the format of the
topic
string.
The Beam runner will start reading data published on this topic from the time the pipeline is started. Any data published on the topic before the pipeline is started will not be read by the runner.
public PubsubIO.Read<T> fromTopic(org.apache.beam.sdk.options.ValueProvider<java.lang.String> topic)
topic()
but with a ValueProvider
.public PubsubIO.Read<T> withClientFactory(PubsubClient.PubsubClientFactory factory)
PubsubJsonClient
, created by the PubsubJsonClient.PubsubJsonClientFactory
. This function allows to change the Pub/Sub client
by providing another PubsubClient.PubsubClientFactory
like the PubsubGrpcClientFactory
.public PubsubIO.Read<T> withTimestampAttribute(java.lang.String timestampAttribute)
The timestamp value is expected to be represented in the attribute as either:
Instant.getMillis()
returns the
correct value for this attribute.
2015-10-29T23:41:41.123Z
. The
sub-second component of the timestamp is optional, and digits beyond the first three
(i.e., time units smaller than milliseconds) will be ignored.
If timestampAttribute
is not provided, the timestamp will be taken from the Pubsub
message's publish timestamp. All windowing will be done relative to these timestamps.
By default, windows are emitted based on an estimate of when this source is likely done
producing data for a given timestamp (referred to as the Watermark; see AfterWatermark
for more details). Any late data will be handled by the trigger specified
with the windowing strategy – by default it will be output immediately.
Note that the system can guarantee that no late data will ever be seen when it assigns
timestamps by arrival time (i.e. timestampAttribute
is not provided).
public PubsubIO.Read<T> withIdAttribute(java.lang.String idAttribute)
Pub/Sub cannot guarantee that no duplicate data will be delivered on the Pub/Sub stream.
If idAttribute
is not provided, Beam cannot guarantee that no duplicate data will be
delivered, and deduplication of the stream will be strictly best effort.
public PubsubIO.Read<T> withCoderAndParseFn(org.apache.beam.sdk.coders.Coder<T> coder, org.apache.beam.sdk.transforms.SimpleFunction<PubsubMessage,T> parseFn)
PCollection.setCoder(Coder)
.public org.apache.beam.sdk.values.PCollection<T> expand(org.apache.beam.sdk.values.PBegin input)
expand
in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>
public void populateDisplayData(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
populateDisplayData
in interface org.apache.beam.sdk.transforms.display.HasDisplayData
populateDisplayData
in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,org.apache.beam.sdk.values.PCollection<T>>