public class PCollectionTuple extends Object implements PInput, POutput
PCollectionTuple
is an immutable tuple of
heterogeneously-typed PCollections
, "keyed" by
TupleTags
. A PCollectionTuple
can be used as the input or
output of a
PTransform
taking
or producing multiple PCollection inputs or outputs that can be of
different types, for instance a
ParDo
with side
outputs.
A PCollectionTuple
can be created and accessed like follows:
PCollection<String> pc1 = ...;
PCollection<Integer> pc2 = ...;
PCollection<Iterable<String>> pc3 = ...;
// Create TupleTags for each of the PCollections to put in the
// PCollectionTuple (the type of the TupleTag enables tracking the
// static type of each of the PCollections in the PCollectionTuple):
TupleTag<String> tag1 = new TupleTag<>();
TupleTag<Integer> tag2 = new TupleTag<>();
TupleTag<Iterable<String>> tag3 = new TupleTag<>();
// Create a PCollectionTuple with three PCollections:
PCollectionTuple pcs =
PCollectionTuple.of(tag1, pc1)
.and(tag2, pc2)
.and(tag3, pc3);
// Create an empty PCollectionTuple:
Pipeline p = ...;
PCollectionTuple pcs2 = PCollectionTuple.empty(p);
// Get PCollections out of a PCollectionTuple, using the same tags
// that were used to put them in:
PCollection<Integer> pcX = pcs.get(tag2);
PCollection<String> pcY = pcs.get(tag1);
PCollection<Iterable<String>> pcZ = pcs.get(tag3);
// Get a map of all PCollections in a PCollectionTuple:
Map<TupleTag<?>, PCollection<?>> allPcs = pcs.getAll();
Modifier and Type | Method and Description |
---|---|
<T> PCollectionTuple |
and(TupleTag<T> tag,
PCollection<T> pc)
Returns a new
PCollectionTuple that has each PCollection and
TupleTag of this PCollectionTuple plus the given PCollection
associated with the given TupleTag . |
<OutputT extends POutput> |
apply(PTransform<PCollectionTuple,OutputT> t)
Like
apply(String, PTransform) but defaulting to the name
of the PTransform . |
<OutputT extends POutput> |
apply(String name,
PTransform<PCollectionTuple,OutputT> t)
Applies the given
PTransform to this input PCollectionTuple ,
using name to identify this specific application of the transform. |
static PCollectionTuple |
empty(Pipeline pipeline)
Returns an empty
PCollectionTuple that is part of the given Pipeline . |
Collection<? extends PValue> |
expand()
|
void |
finishSpecifying()
After building, finalizes this
PInput to make it ready for
being used as an input to a PTransform . |
void |
finishSpecifyingOutput()
As part of applying the producing
PTransform , finalizes this
output to make it ready for being used as an input and for running. |
<T> PCollection<T> |
get(TupleTag<T> tag)
|
Map<TupleTag<?>,PCollection<?>> |
getAll()
Returns an immutable Map from
TupleTag to corresponding
PCollection , for all the members of this PCollectionTuple . |
Pipeline |
getPipeline()
|
<T> boolean |
has(TupleTag<T> tag)
Returns whether this
PCollectionTuple contains a PCollection with
the given tag. |
static <T> PCollectionTuple |
of(TupleTag<T> tag,
PCollection<T> pc)
|
static PCollectionTuple |
ofPrimitiveOutputsInternal(Pipeline pipeline,
TupleTagList outputTags,
com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?> windowingStrategy,
PCollection.IsBounded isBounded)
Returns a
PCollectionTuple with each of the given tags mapping to a new
output PCollection . |
void |
recordAsOutput(AppliedPTransform<?,?,?> transform)
Records that this
POutput is an output of the given
PTransform . |
public static PCollectionTuple empty(Pipeline pipeline)
PCollectionTuple
that is part of the given Pipeline
.
A PCollectionTuple
containing additional elements can be created by calling
and(com.google.cloud.dataflow.sdk.values.TupleTag<T>, com.google.cloud.dataflow.sdk.values.PCollection<T>)
on the result.
public static <T> PCollectionTuple of(TupleTag<T> tag, PCollection<T> pc)
PCollectionTuple
containing the given
PCollection
keyed by the given TupleTag
.
A PCollectionTuple
containing additional elements can be created by calling
and(com.google.cloud.dataflow.sdk.values.TupleTag<T>, com.google.cloud.dataflow.sdk.values.PCollection<T>)
on the result.
public <T> PCollectionTuple and(TupleTag<T> tag, PCollection<T> pc)
PCollectionTuple
that has each PCollection
and
TupleTag
of this PCollectionTuple
plus the given PCollection
associated with the given TupleTag
.
The given TupleTag
should not already be mapped to a
PCollection
in this PCollectionTuple
.
Each PCollection
in the resulting PCollectionTuple
must be
part of the same Pipeline
.
public <T> boolean has(TupleTag<T> tag)
PCollectionTuple
contains a PCollection
with
the given tag.public <T> PCollection<T> get(TupleTag<T> tag)
PCollection
associated with the given TupleTag
in this PCollectionTuple
. Throws IllegalArgumentException
if there is no
such PCollection
, i.e., !has(tag)
.public Map<TupleTag<?>,PCollection<?>> getAll()
TupleTag
to corresponding
PCollection
, for all the members of this PCollectionTuple
.public <OutputT extends POutput> OutputT apply(PTransform<PCollectionTuple,OutputT> t)
apply(String, PTransform)
but defaulting to the name
of the PTransform
.PTransform
public <OutputT extends POutput> OutputT apply(String name, PTransform<PCollectionTuple,OutputT> t)
PTransform
to this input PCollectionTuple
,
using name
to identify this specific application of the transform.
This name is used in various places, including the monitoring UI, logging,
and to stably identify this application node in the job graph.PTransform
public static PCollectionTuple ofPrimitiveOutputsInternal(Pipeline pipeline, TupleTagList outputTags, com.google.cloud.dataflow.sdk.util.WindowingStrategy<?,?> windowingStrategy, PCollection.IsBounded isBounded)
PCollectionTuple
with each of the given tags mapping to a new
output PCollection
.
For use by primitive transformations only.
public Pipeline getPipeline()
PInput
getPipeline
in interface PInput
getPipeline
in interface POutput
public Collection<? extends PValue> expand()
PInput
PInput
into a list of its component output
PValues
.
PValue
expands to itself.PValues
(such as
PCollectionTuple
or PCollectionList
)
expands to its component PValue PValues
.Not intended to be invoked directly by user code.
public void recordAsOutput(AppliedPTransform<?,?,?> transform)
POutput
POutput
is an output of the given
PTransform
.
For a compound POutput
, it is advised to call
this method on each component POutput
.
This is not intended to be invoked by user code, but
is automatically invoked as part of applying the
producing PTransform
.
recordAsOutput
in interface POutput
public void finishSpecifying()
PInput
After building, finalizes this PInput
to make it ready for
being used as an input to a PTransform
.
Automatically invoked whenever apply()
is invoked on
this PInput
, so users do not normally call this explicitly.
finishSpecifying
in interface PInput
public void finishSpecifyingOutput()
POutput
PTransform
, finalizes this
output to make it ready for being used as an input and for running.
This includes ensuring that all PCollections
have Coders
specified or defaulted.
Automatically invoked whenever this POutput
is used
as a PInput
to another PTransform
, or if never
used as a PInput
, when Pipeline.run()
is called, so users do not normally call this explicitly.
finishSpecifyingOutput
in interface POutput