Pipeline (Google Cloud Dataflow SDK 1.2.1 API)

java.lang.Object
- com.google.cloud.dataflow.sdk.Pipeline

Direct Known Subclasses:: DataflowPipeline, DirectPipeline, TestPipeline

public class Pipeline
extends Object

A Pipeline manages a DAG of PTransforms, and the PCollections that the PTransforms consume and produce.

After a Pipeline has been constructed, it can be executed, using a default or an explicit PipelineRunner.

Multiple Pipelines can be constructed and executed independently and concurrently.

Each Pipeline is self-contained and isolated from any other Pipeline. The PValues that are inputs and outputs of each of a Pipeline's PTransforms are also owned by that Pipeline. A PValue owned by one Pipeline can be read only by PTransforms also owned by that Pipeline.

Here's a typical example of use:

 
 // Start by defining the options for the pipeline.
 PipelineOptions options = PipelineOptionsFactory.create();
 // Then create the pipeline.
 Pipeline p = Pipeline.create(options);

 // A root PTransform, like TextIO.Read or Create, gets added
 // to the Pipeline by being applied:
 PCollection<String> lines =
     p.apply(TextIO.Read.from("gs://bucket/dir/file*.txt"));

 // A Pipeline can have multiple root transforms:
 PCollection<String> moreLines =
     p.apply(TextIO.Read.from("gs://bucket/other/dir/file*.txt"));
 PCollection<String> yetMoreLines =
     p.apply(Create.of("yet", "more", "lines").withCoder(StringUtf8Coder.of()));

 // Further PTransforms can be applied, in an arbitrary (acyclic) graph.
 // Subsequent PTransforms (and intermediate PCollections etc.) are
 // implicitly part of the same Pipeline.
 PCollection<String> allLines =
     PCollectionList.of(lines).and(moreLines).and(yetMoreLines)
     .apply(new Flatten<String>());
 PCollection<KV<String, Integer>> wordCounts =
     allLines
     .apply(ParDo.of(new ExtractWords()))
     .apply(new Count<String>());
 PCollection<String> formattedWordCounts =
     wordCounts.apply(ParDo.of(new FormatCounts()));
 formattedWordCounts.apply(TextIO.Write.to("gs://bucket/dir/counts.txt"));

 // PTransforms aren't executed when they're applied, rather they're
 // just added to the Pipeline.  Once the whole Pipeline of PTransforms
 // is constructed, the Pipeline's PTransforms can be run using a
 // PipelineRunner.  The default PipelineRunner executes the Pipeline
 // directly, sequentially, in this one process, which is useful for
 // unit tests and simple experiments:
 p.run();

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`Pipeline.PipelineExecutionException` Thrown during pipeline execution, whenever user code within a pipeline throws an exception.
`static interface`	`Pipeline.PipelineVisitor` A `Pipeline.PipelineVisitor` can be passed into `traverseTopologically(com.google.cloud.dataflow.sdk.Pipeline.PipelineVisitor)` to be called for each of the transforms and values in the Pipeline.

Constructor Summary

Constructors
Modifier	Constructor and Description
`protected`	`Pipeline(PipelineRunner<?> runner)` Deprecated. replaced by `Pipeline(PipelineRunner, PipelineOptions)`
`protected`	`Pipeline(PipelineRunner<?> runner, PipelineOptions options)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`void`	`addValueInternal(PValue value)` Adds the given PValue to this Pipeline.
`<OutputT extends POutput> OutputT`	`apply(PTransform<? super PBegin,OutputT> root)` Like `apply(String, PTransform)` but defaulting to the name of the `PTransform`.
`<OutputT extends POutput> OutputT`	`apply(String name, PTransform<? super PBegin,OutputT> root)` Starts using this pipeline with a root `PTransform` such as `TextIO.READ` or `Create`.
`static <InputT extends PInput,OutputT extends POutput> OutputT`	`applyTransform(InputT input, PTransform<? super InputT,OutputT> transform)` Like `applyTransform(String, PInput, PTransform)` but defaulting to the name provided by the `PTransform`.
`static <InputT extends PInput,OutputT extends POutput> OutputT`	`applyTransform(String name, InputT input, PTransform<? super InputT,OutputT> transform)` Applies the given `PTransform` to this input `InputT` and returns its `OutputT`.
`PBegin`	`begin()` Returns a `PBegin` owned by this Pipeline.
`static Pipeline`	`create(PipelineOptions options)` Constructs a pipeline from the provided options.
`CoderRegistry`	`getCoderRegistry()` Returns the `CoderRegistry` that this Pipeline uses.
`String`	`getFullNameForTesting(PTransform<?,?> transform)` Deprecated.
`PipelineOptions`	`getOptions()` Returns the configured pipeline options.
`PipelineRunner<?>`	`getRunner()` Returns the configured pipeline runner.
`PipelineResult`	`run()` Runs the Pipeline.
`void`	`setCoderRegistry(CoderRegistry coderRegistry)` Sets the `CoderRegistry` that this Pipeline uses.
`String`	`toString()`
`void`	`traverseTopologically(Pipeline.PipelineVisitor visitor)` Invokes the PipelineVisitor's `Pipeline.PipelineVisitor.visitTransform(com.google.cloud.dataflow.sdk.runners.TransformTreeNode)` and `Pipeline.PipelineVisitor.visitValue(com.google.cloud.dataflow.sdk.values.PValue, com.google.cloud.dataflow.sdk.runners.TransformTreeNode)` operations on each of this Pipeline's PTransforms and PValues, in forward topological order.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

- Constructor Detail
  - Pipeline
```
@Deprecated
protected Pipeline(PipelineRunner<?> runner)
```
    Deprecated. replaced by Pipeline(PipelineRunner, PipelineOptions)
  - Pipeline
```
protected Pipeline(PipelineRunner<?> runner,
                   PipelineOptions options)
```
- Method Detail
  - create
```
public static Pipeline create(PipelineOptions options)
```
    Constructs a pipeline from the provided options.
    
    Returns:
    
    The newly created pipeline.
  - begin
```
public PBegin begin()
```
    Returns a PBegin owned by this Pipeline. This is useful as the input of a root PTransform such as TextIO.Read or Create.
  - apply
```
public <OutputT extends POutput> OutputT apply(PTransform<? super PBegin,OutputT> root)
```
    Like apply(String, PTransform) but defaulting to the name of the PTransform.
  - apply
```
public <OutputT extends POutput> OutputT apply(String name,
                                               PTransform<? super PBegin,OutputT> root)
```
    Starts using this pipeline with a root PTransform such as TextIO.READ or Create. This specific call to apply is identified by the provided name. This name is used in various places, including the monitoring UI, logging, and to stably identify this application node in the job graph.
    Alias for begin().apply(name, root).
  - run
```
public PipelineResult run()
```
    Runs the Pipeline.
  - getCoderRegistry
```
public CoderRegistry getCoderRegistry()
```
    Returns the CoderRegistry that this Pipeline uses.
  - setCoderRegistry
```
public void setCoderRegistry(CoderRegistry coderRegistry)
```
    Sets the CoderRegistry that this Pipeline uses.
  - traverseTopologically
```
public void traverseTopologically(Pipeline.PipelineVisitor visitor)
```
    Invokes the PipelineVisitor's Pipeline.PipelineVisitor.visitTransform(com.google.cloud.dataflow.sdk.runners.TransformTreeNode) and Pipeline.PipelineVisitor.visitValue(com.google.cloud.dataflow.sdk.values.PValue, com.google.cloud.dataflow.sdk.runners.TransformTreeNode) operations on each of this Pipeline's PTransforms and PValues, in forward topological order.
    Traversal of the pipeline causes PTransform and PValue instances to be marked as finished, at which point they may no longer be modified.
    Typically invoked by PipelineRunner subclasses.
  - applyTransform
```
public static <InputT extends PInput,OutputT extends POutput> OutputT applyTransform(InputT input,
                                                                                     PTransform<? super InputT,OutputT> transform)
```
    Like applyTransform(String, PInput, PTransform) but defaulting to the name provided by the PTransform.
  - applyTransform
```
public static <InputT extends PInput,OutputT extends POutput> OutputT applyTransform(String name,
                                                                                     InputT input,
                                                                                     PTransform<? super InputT,OutputT> transform)
```
    Applies the given PTransform to this input InputT and returns its OutputT. This uses name to identify this specific application of the transform. This name is used in various places, including the monitoring UI, logging, and to stably identify this application node in the job graph.
    Called by PInput subclasses in their apply methods.
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object
  - getRunner
```
public PipelineRunner<?> getRunner()
```
    Returns the configured pipeline runner.
  - getOptions
```
public PipelineOptions getOptions()
```
    Returns the configured pipeline options.
  - getFullNameForTesting
```
@Deprecated
public String getFullNameForTesting(PTransform<?,?> transform)
```
    Deprecated.
    
    Returns the fully qualified name of a transform for testing.
    
    Throws:
    
    IllegalStateException - if the transform has not been applied to the pipeline or was applied multiple times.
  - addValueInternal
```
public void addValueInternal(PValue value)
```
    Adds the given PValue to this Pipeline.
    For internal use only.

Class Pipeline

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

Pipeline

Pipeline

Method Detail

create

begin

apply

apply

run

getCoderRegistry

setCoderRegistry

traverseTopologically

applyTransform

applyTransform

toString

getRunner

getOptions

getFullNameForTesting

addValueInternal