Class DoFn<InputT extends @Nullable java.lang.Object,​OutputT extends @Nullable java.lang.Object>

  • Type Parameters:
    InputT - the type of the (main) input elements
    OutputT - the type of the (main) output elements
    All Implemented Interfaces:
    java.io.Serializable, HasDisplayData
    Direct Known Subclasses:
    JsonToRow.JsonToRowWithErrFn.ParseWithError, Reshuffle.AssignShardFn, ValueWithRecordId.StripIdsDoFn, View.ToListViewDoFn, Watch.WatchGrowthFn

    public abstract class DoFn<InputT extends @Nullable java.lang.Object,​OutputT extends @Nullable java.lang.Object>
    extends java.lang.Object
    implements java.io.Serializable, HasDisplayData
    The argument to ParDo providing the code to use to process elements of the input PCollection.

    See ParDo for more explanation, examples of use, and discussion of constraints on DoFns, including their serializability, lack of access to global shared mutable state, requirements for failure tolerance, and benefits of optimization.

    DoFns can be tested by using TestPipeline. You can verify their functional correctness in a local test using the DirectRunner as well as running integration tests with your production runner of choice. Typically, you can generate the input data using Create.of(java.lang.Iterable<T>) or other transforms. However, if you need to test the behavior of DoFn.StartBundle and DoFn.FinishBundle with particular bundle boundaries, you can use TestStream.

    Implementations must define a method annotated with DoFn.ProcessElement that satisfies the requirements described there. See the DoFn.ProcessElement for details.

    Example usage:

    
     PCollection<String> lines = ... ;
     PCollection<String> words =
         lines.apply(ParDo.of(new DoFn<String, String>()) {
             @ProcessElement
              public void processElement(@Element String element, BoundedWindow window) {
                ...
              }}));
     
    See Also:
    Serialized Form
    • Constructor Detail

      • DoFn

        public DoFn()
    • Method Detail

      • getAllowedTimestampSkew

        @Deprecated
        public org.joda.time.Duration getAllowedTimestampSkew()
        Deprecated.
        This method permits a DoFn to emit elements behind the watermark. These elements are considered late, and if behind the allowed lateness of a downstream PCollection may be silently dropped. See https://github.com/apache/beam/issues/18065 for details on a replacement.
        Returns the allowed timestamp skew duration, which is the maximum duration that timestamps can be shifted backward in DoFn.WindowedContext.outputWithTimestamp(OutputT, org.joda.time.Instant).

        The default value is Duration.ZERO, in which case timestamps can only be shifted forward to future. For infinite skew, return Duration.millis(Long.MAX_VALUE).

      • getOutputTypeDescriptor

        public TypeDescriptor<OutputT> getOutputTypeDescriptor()
        Returns a TypeDescriptor capturing what is known statically about the output type of this DoFn instance's most-derived class.

        In the normal case of a concrete DoFn subclass with no generic type parameters of its own (including anonymous inner classes), this will be a complete non-generic type, which is good for choosing a default output Coder<O> for the output PCollection<O>.

      • prepareForProcessing

        @Deprecated
        public final void prepareForProcessing()
        Deprecated.
        use DoFn.Setup or DoFn.StartBundle instead. This method will be removed in a future release.
        Finalize the DoFn construction to prepare for processing. This method should be called by runners before any processing methods.
      • populateDisplayData

        public void populateDisplayData​(DisplayData.Builder builder)
        Register display data for the given transform or component.

        populateDisplayData(DisplayData.Builder) is invoked by Pipeline runners to collect display data via DisplayData.from(HasDisplayData). Implementations may call super.populateDisplayData(builder) in order to register display data in the current namespace, but should otherwise use subcomponent.populateDisplayData(builder) to use the namespace of the subcomponent.

        By default, does not register any display data. Implementors may override this method to provide their own display data.

        Specified by:
        populateDisplayData in interface HasDisplayData
        Parameters:
        builder - The builder to populate with display data.
        See Also:
        HasDisplayData