PTransform (Google Cloud Dataflow SDK API)

java.lang.Object
- com.google.cloud.dataflow.sdk.transforms.PTransform<Input,Output>

Type Parameters:
Input - the type of the input to this PTransform
Output - the type of the output of this PTransform

All Implemented Interfaces:

java.io.Serializable

Direct Known Subclasses:

AvroIO.Read.Bound, AvroIO.Write.Bound, BigQueryIO.Read.Bound, BigQueryIO.Write.Bound, CoGroupByKey, Combine.Globally, Combine.GroupedValues, Combine.PerKey, Count.Globally, Count.PerElement, Create, DatastoreIO.Read.Bound, DatastoreIO.Write.Bound, First, Flatten.FlattenIterables, Flatten.FlattenPCollectionList, GroupByKey, GroupByKey.GroupAlsoByWindow, GroupByKey.GroupByKeyOnly, GroupByKey.ReifyTimestampsAndWindows, GroupByKey.SortValuesByTimestamp, Keys, KvSwap, ParDo.Bound, ParDo.BoundMulti, Partition, PubsubIO.Read.Bound, PubsubIO.Write.Bound, RateLimiting.RateLimitingTransform, RemoveDuplicates, TextIO.Read.Bound, TextIO.Write.Bound, Values, View.AsIterable, View.AsSingleton, View.CreatePCollectionView, Window.Bound, Window.Remerge, WithKeys
```
public abstract class PTransform<Input extends PInput,Output extends POutput>
extends java.lang.Object
implements java.io.Serializable
```
A PTransform<Input, Output> is an operation that takes an Input (some subtype of PInput) and produces an Output (some subtype of POutput).
Common PTransforms include root PTransforms like TextIO.Read, Create, processing and conversion operations like ParDo, GroupByKey, CoGroupByKey, Combine, and Count, and outputting PTransforms like TextIO.Write. Users also define their own application-specific composite PTransforms.
Each PTransform<Input, Output> has a single Input type and a single Output type. Many PTransforms conceptually transform one input value to one output value, and in this case Input and Output are typically instances of PCollection. A root PTransform conceptually has no input; in this case, conventionally a PBegin object produced by calling Pipeline.begin() is used as the input. An outputting PTransform conceptually has no output; in this case, conventionally PDone is used as its output type. Some PTransforms conceptually have multiple inputs and/or outputs; in these cases special "bundling" classes like PCollectionList, PCollectionTuple are used to combine multiple values into a single bundle for passing into or returning from the PTransform.
A PTransform<Input, Output> is invoked by calling apply() on its Input, returning its Output. Calls can be chained to concisely create linear pipeline segments. For example:
```
 PCollection<T1> pc1 = ...;
 PCollection<T2> pc2 =
     pc1.apply(ParDo.of(new MyDoFn<T1,KV<K,V>>()))
        .apply(GroupByKey.<K, V>create())
        .apply(Combine.perKey(new MyKeyedCombineFn<K,V>()))
        .apply(ParDo.of(new MyDoFn2<KV<K,V>,T2>()));
  
```
PTransform operations have unique names, which are used by the system when explaining what's going on during optimization and execution. Each PTransform gets a system-provided default name, but it's a good practice to specify an explicit name, where possible, using the named() method offered by some PTransforms such as ParDo. For example:
```
 ...
 .apply(ParDo.named("Step1").of(new MyDoFn3()))
 ...
  
```
Each PCollection output produced by a PTransform, either directly or within a "bundling" class, automatically gets its own name derived from the name of its producing PTransform.
Each PCollection output produced by a PTransform also records a Coder that specifies how the elements of that PCollection are to be encoded as a byte string, if necessary. The PTransform may provide a default Coder for any of its outputs, for instance by deriving it from the PTransform input's Coder. If the PTransform does not specify the Coder for an output PCollection, the system will attempt to infer a Coder for it, based on what's known at run-time about the Java type of the output's elements. The enclosing Pipeline's CoderRegistry (accessible via Pipeline.getCoderRegistry()) defines the mapping from Java types to the default Coder to use, for a standard set of Java types; users can extend this mapping for additional types, via CoderRegistry.registerCoder(java.lang.Class<?>, java.lang.Class<?>). If this inference process fails, either because the Java type was not known at run-time (e.g., due to Java's "erasure" of generic types) or there was no default Coder registered, then the Coder should be specified manually by calling TypedPValue.setCoder(com.google.cloud.dataflow.sdk.coders.Coder<T>) on the output PCollection. The Coder of every output PCollection must be determined one way or another before that output is used as an input to another PTransform, or before the enclosing Pipeline is run.
A small number of PTransforms are implemented natively by the Google Cloud Dataflow SDK; such PTransforms simply return an output value as their apply implementation. The majority of PTransforms are implemented as composites of other PTransforms. Such a PTransform subclass typically just implements apply(Input), computing its Output value from its Input value. User programs are encouraged to use this mechanism to modularize their own code. Such composite abstractions get their own name, and navigating through the composition hierarchy of PTransforms is supported by the monitoring interface. Examples of composite PTransforms can be found in this directory and in examples. From the caller's point of view, there is no distinction between a PTransform implemented natively and one implemented in terms of other PTransforms; both kinds of PTransform are invoked in the same way, using apply().
Note on Serialization
PTransform doesn't actually support serialization, despite implementing Serializable.
PTransform is marked Serializable solely because it is common for an anonymous DoFn, instance to be created within an apply() method of a composite PTransform.
Each of those *Fns is Serializable, but unfortunately its instance state will contain a reference to the enclosing PTransform instance, and so attempt to serialize the PTransform instance, even though the *Fn instance never references anything about the enclosing PTransform.
To allow such anonymous *Fns to be written conveniently, PTransform is marked as Serializable, and includes dummy writeObject() and readObject() operations that do not save or restore any state.
See Also:
Applying Transformations, Serialized Form

Field Summary

Fields
Modifier and Type	Field and Description
`protected java.lang.String`	`name` The base name of this `PTransform`, e.g., from `ParDo.named(String)`, or from defaults, or `null` if not yet assigned.

Constructor Summary

Constructors
Modifier Constructor and Description

protected PTransform()

protected PTransform(java.lang.String name)

Constructors
Modifier	Constructor and Description
`protected`	`PTransform()`
`protected`	`PTransform(java.lang.String name)`

Method Summary

Methods
Modifier and Type	Method and Description
`Output`	`apply(Input input)` Applies this `PTransform` on the given `Input`, and returns its `Output`.
`void`	`finishSpecifying()` After building, finalizes this `PTransform` to make it ready for running.
`protected CoderRegistry`	`getCoderRegistry()` Deprecated. use pipeline.getCoderRegistry()
`protected java.lang.String`	`getDefaultName()` Returns the name to use by default for this `PTransform` (not including the names of any enclosing `PTransform`s).
`protected Coder<?>`	`getDefaultOutputCoder()` Returns the default `Coder` to use for the output of this single-output `PTransform`, or `null` if none can be inferred.
`<T> Coder<T>`	`getDefaultOutputCoder(TypedPValue<T> output)` Returns the default `Coder` to use for the given output of this single-output `PTransform`, or `null` if none can be inferred.
`Input`	`getInput()` Deprecated. Use pipeline.getInput(transform)
`protected java.lang.String`	`getKindString()` Returns a string describing what kind of `PTransform` this is.
`java.lang.String`	`getName()` Returns the transform name.
`Output`	`getOutput()` Deprecated.
`Pipeline`	`getPipeline()` Deprecated.
`void`	`setName(java.lang.String name)` Sets the base name of this `PTransform`.
`void`	`setPipeline(Pipeline pipeline)` Deprecated.
`java.lang.String`	`toString()`
`PTransform<Input,Output>`	`withName(java.lang.String name)` Sets the base name of this `PTransform` and returns itself.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Field Detail
  - name
```
protected transient java.lang.String name
```
    The base name of this PTransform, e.g., from ParDo.named(String), or from defaults, or null if not yet assigned.
- Constructor Detail
  - PTransform
```
protected PTransform()
```
  - PTransform
```
protected PTransform(java.lang.String name)
```
- Method Detail
  - apply
```
public Output apply(Input input)
```
    Applies this PTransform on the given Input, and returns its Output.
    Composite transforms, which are defined in terms of other transforms, should return the output of one of the composed transforms. Non-composite transforms, which do not apply any transforms internally, should return a new unbound output and register evaluators (via backend-specific registration methods).
    The default implementation throws an exception. A derived class must either implement apply, or else each runner must supply a custom implementation via PipelineRunner.apply(com.google.cloud.dataflow.sdk.transforms.PTransform<Input, Output>, Input).
  - setName
```
public void setName(java.lang.String name)
```
    Sets the base name of this PTransform.
  - withName
```
public PTransform<Input,Output> withName(java.lang.String name)
```
    Sets the base name of this PTransform and returns itself.
    This is a shortcut for calling setName(java.lang.String), which allows method chaining.
  - getName
```
public java.lang.String getName()
```
    Returns the transform name.
    This name is provided by the transform creator and is not required to be unique.
  - getPipeline
```
@Deprecated
public Pipeline getPipeline()
```
    Deprecated.
    
    Returns the owning Pipeline of this PTransform.
    
    Throws:
    
    java.lang.IllegalStateException - if the owning Pipeline hasn't been set yet
  - getInput
```
@Deprecated
public Input getInput()
```
    Deprecated. Use pipeline.getInput(transform)
    
    Returns the input of this transform.
    
    Throws:
    
    java.lang.IllegalStateException - if this PTransform hasn't been applied yet
  - getOutput
```
@Deprecated
public Output getOutput()
```
    Deprecated.
    
    Returns the output of this transform.
    
    Throws:
    
    java.lang.IllegalStateException - if this PTransform hasn't been applied yet #deprecated use pipeline.getOutput(transform)
  - getCoderRegistry
```
@Deprecated
protected CoderRegistry getCoderRegistry()
```
    Deprecated. use pipeline.getCoderRegistry()
    
    Returns the CoderRegistry, useful for inferring Coders.
    
    Throws:
    
    java.lang.IllegalStateException - if the owning Pipeline hasn't been set yet
  - setPipeline
```
@Deprecated
public void setPipeline(Pipeline pipeline)
```
    Deprecated.
    
    Associates this PTransform with the given Pipeline.
    For internal use only.
    
    Throws:
    
    java.lang.IllegalArgumentException - if this transform has already been associated with a pipeline
  - toString
```
public java.lang.String toString()
```
    Overrides:
    
    toString in class java.lang.Object
  - getDefaultName
```
protected java.lang.String getDefaultName()
```
    Returns the name to use by default for this PTransform (not including the names of any enclosing PTransforms).
    By default, returns getKindString().
    The caller is responsible for ensuring that names of applied PTransforms are unique, e.g., by adding a uniquifying suffix when needed.
  - getKindString
```
protected java.lang.String getKindString()
```
    Returns a string describing what kind of PTransform this is.
    By default, returns the base name of this PTransform's class.
  - finishSpecifying
```
public void finishSpecifying()
```
    After building, finalizes this PTransform to make it ready for running. Called automatically when its output(s) are finished.
    Not normally called by user code.
  - getDefaultOutputCoder
```
protected Coder<?> getDefaultOutputCoder()
```
    Returns the default Coder to use for the output of this single-output PTransform, or null if none can be inferred.
    By default, returns null.
  - getDefaultOutputCoder
```
public <T> Coder<T> getDefaultOutputCoder(TypedPValue<T> output)
```
    Returns the default Coder to use for the given output of this single-output PTransform, or null if none can be inferred.

Class PTransform<Input extends PInput,Output extends POutput>

Note on Serialization

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

name

Constructor Detail

PTransform

PTransform

Method Detail

apply

setName

withName

getName

getPipeline

getInput

getOutput

getCoderRegistry

setPipeline

toString

getDefaultName

getKindString

finishSpecifying

getDefaultOutputCoder

getDefaultOutputCoder