Class PubsubUnboundedSink

  • All Implemented Interfaces:
    java.io.Serializable, org.apache.beam.sdk.transforms.display.HasDisplayData

    public class PubsubUnboundedSink
    extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<PubsubMessage>,​org.apache.beam.sdk.values.PDone>
    A PTransform which streams messages to Pubsub.
    • The underlying implementation is just a GroupByKey followed by a ParDo which publishes as a side effect. (In the future we want to design and switch to a custom UnboundedSink implementation so as to gain access to system watermark and end-of-pipeline cleanup.)
    • We try to send messages in batches while also limiting send latency.
    • No stats are logged. Rather some counters are used to keep track of elements and batches.
    • Though some background threads are used by the underlying netty system all actual Pubsub calls are blocking. We rely on the underlying runner to allow multiple DoFn instances to execute concurrently and hide latency.
    • A failed bundle will cause messages to be resent. Thus we rely on the Pubsub consumer to dedup messages.
    See Also:
    Serialized Form
    • Constructor Detail

      • PubsubUnboundedSink

        public PubsubUnboundedSink​(PubsubClient.PubsubClientFactory pubsubFactory,
                                   org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic,
                                   java.lang.String timestampAttribute,
                                   java.lang.String idAttribute,
                                   int numShards,
                                   java.lang.String pubsubRootUrl)
      • PubsubUnboundedSink

        public PubsubUnboundedSink​(PubsubClient.PubsubClientFactory pubsubFactory,
                                   org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic,
                                   java.lang.String timestampAttribute,
                                   java.lang.String idAttribute,
                                   int numShards,
                                   int publishBatchSize,
                                   int publishBatchBytes)
      • PubsubUnboundedSink

        public PubsubUnboundedSink​(PubsubClient.PubsubClientFactory pubsubFactory,
                                   org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> topic,
                                   java.lang.String timestampAttribute,
                                   java.lang.String idAttribute,
                                   int numShards,
                                   int publishBatchSize,
                                   int publishBatchBytes,
                                   java.lang.String pubsubRootUrl)
    • Method Detail

      • getTopicProvider

        public org.apache.beam.sdk.options.ValueProvider<PubsubClient.TopicPath> getTopicProvider()
        Get the ValueProvider for the topic being written to.
      • getTimestampAttribute

        public @Nullable java.lang.String getTimestampAttribute()
        Get the timestamp attribute.
      • getIdAttribute

        public @Nullable java.lang.String getIdAttribute()
        Get the id attribute.
      • expand

        public org.apache.beam.sdk.values.PDone expand​(org.apache.beam.sdk.values.PCollection<PubsubMessage> input)
        Specified by:
        expand in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<PubsubMessage>,​org.apache.beam.sdk.values.PDone>