Class PubsubUnboundedSink
- java.lang.Object
-
- org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<PubsubMessage>,org.apache.beam.sdk.values.PDone>
-
- org.apache.beam.sdk.io.gcp.pubsub.PubsubUnboundedSink
-
- All Implemented Interfaces:
java.io.Serializable
,org.apache.beam.sdk.transforms.display.HasDisplayData
public class PubsubUnboundedSink extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<PubsubMessage>,org.apache.beam.sdk.values.PDone>
A PTransform which streams messages to Pubsub.- The underlying implementation is just a
GroupByKey
followed by aParDo
which publishes as a side effect. (In the future we want to design and switch to a customUnboundedSink
implementation so as to gain access to system watermark and end-of-pipeline cleanup.) - We try to send messages in batches while also limiting send latency.
- No stats are logged. Rather some counters are used to keep track of elements and batches.
- Though some background threads are used by the underlying netty system all actual Pubsub
calls are blocking. We rely on the underlying runner to allow multiple
DoFn
instances to execute concurrently and hide latency. - A failed bundle will cause messages to be resent. Thus we rely on the Pubsub consumer to dedup messages.
- See Also:
- Serialized Form
-
-
Constructor Summary
Constructors Constructor Description PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic, java.lang.String timestampAttribute, java.lang.String idAttribute, int numShards)
PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic, java.lang.String timestampAttribute, java.lang.String idAttribute, int numShards, int publishBatchSize, int publishBatchBytes)
PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic, java.lang.String timestampAttribute, java.lang.String idAttribute, int numShards, int publishBatchSize, int publishBatchBytes, java.lang.String pubsubRootUrl)
PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic, java.lang.String timestampAttribute, java.lang.String idAttribute, int numShards, java.lang.String pubsubRootUrl)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.beam.sdk.values.PDone
expand(org.apache.beam.sdk.values.PCollection<PubsubMessage> input)
@Nullable java.lang.String
getIdAttribute()
Get the id attribute.@Nullable java.lang.String
getTimestampAttribute()
Get the timestamp attribute.PubsubClient.TopicPath
getTopic()
Get the topic being written to.org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath>
getTopicProvider()
Get theValueProvider
for the topic being written to.
-
-
-
Constructor Detail
-
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic, java.lang.String timestampAttribute, java.lang.String idAttribute, int numShards)
-
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic, java.lang.String timestampAttribute, java.lang.String idAttribute, int numShards, java.lang.String pubsubRootUrl)
-
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic, java.lang.String timestampAttribute, java.lang.String idAttribute, int numShards, int publishBatchSize, int publishBatchBytes)
-
PubsubUnboundedSink
public PubsubUnboundedSink(PubsubClient.PubsubClientFactory pubsubFactory, org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic, java.lang.String timestampAttribute, java.lang.String idAttribute, int numShards, int publishBatchSize, int publishBatchBytes, java.lang.String pubsubRootUrl)
-
-
Method Detail
-
getTopic
public PubsubClient.TopicPath getTopic()
Get the topic being written to.
-
getTopicProvider
public org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> getTopicProvider()
Get theValueProvider
for the topic being written to.
-
getTimestampAttribute
public @Nullable java.lang.String getTimestampAttribute()
Get the timestamp attribute.
-
getIdAttribute
public @Nullable java.lang.String getIdAttribute()
Get the id attribute.
-
expand
public org.apache.beam.sdk.values.PDone expand(org.apache.beam.sdk.values.PCollection<PubsubMessage> input)
- Specified by:
expand
in classorg.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<PubsubMessage>,org.apache.beam.sdk.values.PDone>
-
-