Class PubsubIO.Read<T>

  • All Implemented Interfaces:
    java.io.Serializable, org.apache.beam.sdk.transforms.display.HasDisplayData
    Enclosing class:
    PubsubIO

    public abstract static class PubsubIO.Read<T>
    extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<T>>
    Implementation of read methods.
    See Also:
    Serialized Form
    • Field Summary

      • Fields inherited from class org.apache.beam.sdk.transforms.PTransform

        name, resourceHints
    • Constructor Summary

      Constructors 
      Constructor Description
      Read()  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.beam.sdk.values.PCollection<T> expand​(org.apache.beam.sdk.values.PBegin input)  
      PubsubIO.Read<T> fromSubscription​(java.lang.String subscription)
      Reads from the given subscription.
      PubsubIO.Read<T> fromSubscription​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> subscription)
      Like subscription() but with a ValueProvider.
      PubsubIO.Read<T> fromTopic​(java.lang.String topic)
      Creates and returns a transform for reading from a Cloud Pub/Sub topic.
      PubsubIO.Read<T> fromTopic​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> topic)
      Like fromTopic(String) but with a ValueProvider.
      void populateDisplayData​(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)  
      PubsubIO.Read<T> withClientFactory​(PubsubClient.PubsubClientFactory factory)
      The default client to write to Pub/Sub is the PubsubJsonClient, created by the PubsubJsonClient.PubsubJsonClientFactory.
      PubsubIO.Read<T> withCoderAndParseFn​(org.apache.beam.sdk.coders.Coder<T> coder, org.apache.beam.sdk.transforms.SimpleFunction<PubsubMessage,​T> parseFn)
      Causes the source to return a PubsubMessage that includes Pubsub attributes, and uses the given parsing function to transform the PubsubMessage into an output type.
      PubsubIO.Read<T> withDeadLetterTopic​(java.lang.String deadLetterTopic)
      Creates and returns a transform for writing read failures out to a dead-letter topic.
      PubsubIO.Read<T> withDeadLetterTopic​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> deadLetterTopic)
      Like withDeadLetterTopic(String) but with a ValueProvider.
      PubsubIO.Read<T> withIdAttribute​(java.lang.String idAttribute)
      When reading from Cloud Pub/Sub where unique record identifiers are provided as Pub/Sub message attributes, specifies the name of the attribute containing the unique identifier.
      PubsubIO.Read<T> withTimestampAttribute​(java.lang.String timestampAttribute)
      When reading from Cloud Pub/Sub where record timestamps are provided as Pub/Sub message attributes, specifies the name of the attribute that contains the timestamp.
      • Methods inherited from class org.apache.beam.sdk.transforms.PTransform

        compose, compose, getAdditionalInputs, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, setResourceHints, toString, validate, validate
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
    • Constructor Detail

      • Read

        public Read()
    • Method Detail

      • fromSubscription

        public PubsubIO.Read<T> fromSubscription​(java.lang.String subscription)
        Reads from the given subscription.

        See PubsubIO.PubsubSubscription.fromPath(String) for more details on the format of the subscription string.

        Multiple readers reading from the same subscription will each receive some arbitrary portion of the data. Most likely, separate readers should use their own subscriptions.

      • fromSubscription

        public PubsubIO.Read<T> fromSubscription​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> subscription)
        Like subscription() but with a ValueProvider.
      • fromTopic

        public PubsubIO.Read<T> fromTopic​(java.lang.String topic)
        Creates and returns a transform for reading from a Cloud Pub/Sub topic. Mutually exclusive with fromSubscription(String).

        See PubsubIO.PubsubTopic.fromPath(String) for more details on the format of the topic string.

        The Beam runner will start reading data published on this topic from the time the pipeline is started. Any data published on the topic before the pipeline is started will not be read by the runner.

      • fromTopic

        public PubsubIO.Read<T> fromTopic​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> topic)
        Like fromTopic(String) but with a ValueProvider.
      • withDeadLetterTopic

        public PubsubIO.Read<T> withDeadLetterTopic​(java.lang.String deadLetterTopic)
        Creates and returns a transform for writing read failures out to a dead-letter topic.

        The message written to the dead-letter will contain three attributes:

        • exceptionClassName: The type of exception that was thrown.
        • exceptionMessage: The message in the exception
        • pubsubMessageId: The message id of the original Pub/Sub message if it was read in, otherwise ""

        The PubsubClient.PubsubClientFactory used in the PubsubIO.Write transform for errors will be the same as used in the final PubsubIO.Read transform.

        If there might be a parsing error (or similar), then this should be set up on the topic to avoid wasting resources and to provide more error details with the message written to Pub/Sub. Otherwise, the Pub/Sub topic should have a dead-letter configuration set up to avoid an infinite retry loop.

        Only failures that result from the PubsubIO.Read configuration (e.g. parsing errors) will be sent to the dead-letter topic. Errors that occur after a successful read will need to set up their own PubsubIO.Write transform. Errors with delivery require configuring Pub/Sub itself to write to the dead-letter topic after a certain number of failed attempts.

        See PubsubIO.PubsubTopic.fromPath(String) for more details on the format of the deadLetterTopic string.

      • withDeadLetterTopic

        public PubsubIO.Read<T> withDeadLetterTopic​(org.apache.beam.sdk.options.ValueProvider<java.lang.String> deadLetterTopic)
        Like withDeadLetterTopic(String) but with a ValueProvider.
      • withTimestampAttribute

        public PubsubIO.Read<T> withTimestampAttribute​(java.lang.String timestampAttribute)
        When reading from Cloud Pub/Sub where record timestamps are provided as Pub/Sub message attributes, specifies the name of the attribute that contains the timestamp.

        The timestamp value is expected to be represented in the attribute as either:

        • a numerical value representing the number of milliseconds since the Unix epoch. For example, if using the Joda time classes, Instant.getMillis() returns the correct value for this attribute.
        • a String in RFC 3339 format. For example, 2015-10-29T23:41:41.123Z. The sub-second component of the timestamp is optional, and digits beyond the first three (i.e., time units smaller than milliseconds) will be ignored.

        If timestampAttribute is not provided, the timestamp will be taken from the Pubsub message's publish timestamp. All windowing will be done relative to these timestamps.

        By default, windows are emitted based on an estimate of when this source is likely done producing data for a given timestamp (referred to as the Watermark; see AfterWatermark for more details). Any late data will be handled by the trigger specified with the windowing strategy – by default it will be output immediately.

        Note that the system can guarantee that no late data will ever be seen when it assigns timestamps by arrival time (i.e. timestampAttribute is not provided).

        See Also:
        RFC 3339
      • withIdAttribute

        public PubsubIO.Read<T> withIdAttribute​(java.lang.String idAttribute)
        When reading from Cloud Pub/Sub where unique record identifiers are provided as Pub/Sub message attributes, specifies the name of the attribute containing the unique identifier. The value of the attribute can be any string that uniquely identifies this record.

        Pub/Sub cannot guarantee that no duplicate data will be delivered on the Pub/Sub stream. If idAttribute is not provided, Beam cannot guarantee that no duplicate data will be delivered, and deduplication of the stream will be strictly best effort.

      • withCoderAndParseFn

        public PubsubIO.Read<T> withCoderAndParseFn​(org.apache.beam.sdk.coders.Coder<T> coder,
                                                    org.apache.beam.sdk.transforms.SimpleFunction<PubsubMessage,​T> parseFn)
        Causes the source to return a PubsubMessage that includes Pubsub attributes, and uses the given parsing function to transform the PubsubMessage into an output type. A Coder for the output type T must be registered or set on the output via PCollection.setCoder(Coder).
      • expand

        public org.apache.beam.sdk.values.PCollection<T> expand​(org.apache.beam.sdk.values.PBegin input)
        Specified by:
        expand in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<T>>
      • populateDisplayData

        public void populateDisplayData​(org.apache.beam.sdk.transforms.display.DisplayData.Builder builder)
        Specified by:
        populateDisplayData in interface org.apache.beam.sdk.transforms.display.HasDisplayData
        Overrides:
        populateDisplayData in class org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PBegin,​org.apache.beam.sdk.values.PCollection<T>>