Class PCollection<T>
- java.lang.Object
-
- org.apache.beam.sdk.values.PValueBase
-
- org.apache.beam.sdk.values.PCollection<T>
-
- Type Parameters:
T
- the type of the elements of thisPCollection
public class PCollection<T> extends PValueBase implements PValue
APCollection<T>
is an immutable collection of values of typeT
. APCollection
can contain either a bounded or unbounded number of elements. Bounded and unboundedPCollections
are produced as the output ofPTransforms
(including root PTransforms likeRead
andCreate
), and can be passed as the inputs of other PTransforms.Some root transforms produce bounded
PCollections
and others produce unbounded ones. For example,GenerateSequence.from(long)
withGenerateSequence.to(long)
produces a fixed set of integers, so it produces a boundedPCollection
.GenerateSequence.from(long)
without aGenerateSequence.to(long)
produces all integers as an infinite stream, so it produces an unboundedPCollection
.Each element in a
PCollection
has an associated timestamp. Readers assign timestamps to elements when they createPCollections
, and otherPTransforms
propagate these timestamps from their input to their output. See the documentation onBoundedSource.BoundedReader
andUnboundedSource.UnboundedReader
for more information on how these readers produce timestamps and watermarks.Additionally, a
PCollection
has an associatedWindowFn
and each element is assigned to a set of windows. By default, the windowing function isGlobalWindows
and all elements are assigned into a single default window. This default can be overridden with theWindow
PTransform
.See the individual
PTransform
subclasses for specific information on how they propagate timestamps and windowing.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PCollection.IsBounded
The enumeration of cases for whether aPCollection
is bounded.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description <OutputT extends POutput>
OutputTapply(java.lang.String name, PTransform<? super PCollection<T>,OutputT> t)
Applies the givenPTransform
to this inputPCollection
, usingname
to identify this specific application of the transform.<OutputT extends POutput>
OutputTapply(PTransform<? super PCollection<T>,OutputT> t)
of thePTransform
.static <T> PCollection<T>
createPrimitiveOutputInternal(Pipeline pipeline, WindowingStrategy<?,?> windowingStrategy, PCollection.IsBounded isBounded, @Nullable Coder<T> coder)
For internal use only; no backwards-compatibility guarantees.static <T> PCollection<T>
createPrimitiveOutputInternal(Pipeline pipeline, WindowingStrategy<?,?> windowingStrategy, PCollection.IsBounded isBounded, @Nullable Coder<T> coder, TupleTag<?> tag)
For internal use only; no backwards-compatibility guarantees.java.util.Map<TupleTag<?>,PValue>
expand()
void
finishSpecifying(PInput input, PTransform<?,?> transform)
After building, finalizes thisPValue
to make it ready for running.void
finishSpecifyingOutput(java.lang.String transformName, PInput input, PTransform<?,?> transform)
As part of applying the producingPTransform
, finalizes this output to make it ready for being used as an input and for running.Coder<T>
getCoder()
Returns theCoder
used by thisPCollection
to encode and decode the values stored in it.SerializableFunction<Row,T>
getFromRowFunction()
Returns the attached schema's fromRowFunction.java.lang.String
getName()
Returns the name of thisPCollection
.Schema
getSchema()
Returns the attached schema.SerializableFunction<T,Row>
getToRowFunction()
Returns the attached schema's toRowFunction.@Nullable TypeDescriptor<T>
getTypeDescriptor()
Returns aTypeDescriptor<T>
with some reflective information aboutT
, if possible.WindowingStrategy<?,?>
getWindowingStrategy()
Returns theWindowingStrategy
of thisPCollection
.boolean
hasSchema()
Returns whether thisPCollection
has an attached schema.PCollection.IsBounded
isBounded()
PCollection<T>
setCoder(Coder<T> coder)
Sets theCoder
used by thisPCollection
to encode and decode the values stored in it.PCollection<T>
setIsBoundedInternal(PCollection.IsBounded isBounded)
For internal use only; no backwards-compatibility guarantees.PCollection<T>
setName(java.lang.String name)
Sets the name of thisPCollection
.PCollection<T>
setRowSchema(Schema schema)
Sets a schema on this PCollection.PCollection<T>
setSchema(Schema schema, TypeDescriptor<T> typeDescriptor, SerializableFunction<T,Row> toRowFunction, SerializableFunction<Row,T> fromRowFunction)
Sets aSchema
on thisPCollection
.PCollection<T>
setTypeDescriptor(TypeDescriptor<T> typeDescriptor)
Sets theTypeDescriptor<T>
for thisPCollection<T>
.PCollection<T>
setWindowingStrategyInternal(WindowingStrategy<?,?> windowingStrategy)
For internal use only; no backwards-compatibility guarantees.-
Methods inherited from class org.apache.beam.sdk.values.PValueBase
getKindString, getPipeline, toString
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
-
Methods inherited from interface org.apache.beam.sdk.values.PInput
getPipeline
-
Methods inherited from interface org.apache.beam.sdk.values.POutput
getPipeline
-
-
-
-
Method Detail
-
finishSpecifyingOutput
public void finishSpecifyingOutput(java.lang.String transformName, PInput input, PTransform<?,?> transform)
Description copied from interface:POutput
As part of applying the producingPTransform
, finalizes this output to make it ready for being used as an input and for running.This includes ensuring that all
PCollections
haveCoders
specified or defaulted.Automatically invoked whenever this
POutput
is output, afterPOutput.finishSpecifyingOutput(String, PInput, PTransform)
has been called on each componentPValue
returned byPOutput.expand()
.- Specified by:
finishSpecifyingOutput
in interfacePOutput
- Overrides:
finishSpecifyingOutput
in classPValueBase
-
finishSpecifying
public void finishSpecifying(PInput input, PTransform<?,?> transform)
After building, finalizes thisPValue
to make it ready for running. Automatically invoked whenever thePValue
is "used" (e.g., when apply() is called on it) and when the Pipeline is run (useful if this is aPValue
with no consumers).- Specified by:
finishSpecifying
in interfacePValue
- Overrides:
finishSpecifying
in classPValueBase
- Parameters:
input
- thePInput
thePTransform
was applied to to produce this outputtransform
- thePTransform
that produced thisPValue
-
getTypeDescriptor
public @Nullable TypeDescriptor<T> getTypeDescriptor()
Returns aTypeDescriptor<T>
with some reflective information aboutT
, if possible. May returnnull
if no information is available. Subclasses may override this to enable betterCoder
inference.
-
getName
public java.lang.String getName()
Returns the name of thisPCollection
.By default, the name of a
PCollection
is based on the name of thePTransform
that produces it. It can be specified explicitly by callingsetName(java.lang.String)
.- Specified by:
getName
in interfacePValue
- Overrides:
getName
in classPValueBase
- Throws:
java.lang.IllegalStateException
- if the name hasn't been set yet
-
expand
public final java.util.Map<TupleTag<?>,PValue> expand()
Description copied from interface:PValue
Expands thisPOutput
into a list of its component outputPValues
.- A
PValue
expands to itself. - A tuple or list of
PValues
(such asPCollectionTuple
orPCollectionList
) expands to its componentPValue PValues
.
Not intended to be invoked directly by user code..
- A
-
setName
public PCollection<T> setName(java.lang.String name)
Sets the name of thisPCollection
. Returnsthis
.- Overrides:
setName
in classPValueBase
- Throws:
java.lang.IllegalStateException
- if thisPCollection
has already been finalized and may no longer be set. Onceapply(org.apache.beam.sdk.transforms.PTransform<? super org.apache.beam.sdk.values.PCollection<T>, OutputT>)
has been called, this will be the case.
-
getCoder
public Coder<T> getCoder()
Returns theCoder
used by thisPCollection
to encode and decode the values stored in it.- Throws:
java.lang.IllegalStateException
- if theCoder
hasn't been set, and couldn't be inferred.
-
setCoder
public PCollection<T> setCoder(Coder<T> coder)
- Throws:
java.lang.IllegalStateException
- if thisPCollection
has already been finalized and may no longer be set. Onceapply(org.apache.beam.sdk.transforms.PTransform<? super org.apache.beam.sdk.values.PCollection<T>, OutputT>)
has been called, this will be the case.
-
setRowSchema
@Experimental(SCHEMAS) public PCollection<T> setRowSchema(Schema schema)
Sets a schema on this PCollection.Can only be called on a
PCollection
.
-
setSchema
@Experimental(SCHEMAS) public PCollection<T> setSchema(Schema schema, TypeDescriptor<T> typeDescriptor, SerializableFunction<T,Row> toRowFunction, SerializableFunction<Row,T> fromRowFunction)
Sets aSchema
on thisPCollection
.
-
hasSchema
@Experimental(SCHEMAS) public boolean hasSchema()
Returns whether thisPCollection
has an attached schema.
-
getSchema
@Experimental(SCHEMAS) public Schema getSchema()
Returns the attached schema.
-
getToRowFunction
@Experimental(SCHEMAS) public SerializableFunction<T,Row> getToRowFunction()
Returns the attached schema's toRowFunction.
-
getFromRowFunction
@Experimental(SCHEMAS) public SerializableFunction<Row,T> getFromRowFunction()
Returns the attached schema's fromRowFunction.
-
apply
public <OutputT extends POutput> OutputT apply(PTransform<? super PCollection<T>,OutputT> t)
of thePTransform
.- Returns:
- the output of the applied
PTransform
-
apply
public <OutputT extends POutput> OutputT apply(java.lang.String name, PTransform<? super PCollection<T>,OutputT> t)
Applies the givenPTransform
to this inputPCollection
, usingname
to identify this specific application of the transform. This name is used in various places, including the monitoring UI, logging, and to stably identify this application node in the job graph.- Returns:
- the output of the applied
PTransform
-
getWindowingStrategy
public WindowingStrategy<?,?> getWindowingStrategy()
Returns theWindowingStrategy
of thisPCollection
.
-
isBounded
public PCollection.IsBounded isBounded()
-
setTypeDescriptor
public PCollection<T> setTypeDescriptor(TypeDescriptor<T> typeDescriptor)
Sets theTypeDescriptor<T>
for thisPCollection<T>
. This may allow the enclosingPCollectionTuple
,PCollectionList
, orPTransform<?, PCollection<T>>
, etc., to provide more detailed reflective information.
-
setWindowingStrategyInternal
@Internal public PCollection<T> setWindowingStrategyInternal(WindowingStrategy<?,?> windowingStrategy)
For internal use only; no backwards-compatibility guarantees.
-
setIsBoundedInternal
@Internal public PCollection<T> setIsBoundedInternal(PCollection.IsBounded isBounded)
For internal use only; no backwards-compatibility guarantees.
-
createPrimitiveOutputInternal
@Internal public static <T> PCollection<T> createPrimitiveOutputInternal(Pipeline pipeline, WindowingStrategy<?,?> windowingStrategy, PCollection.IsBounded isBounded, @Nullable Coder<T> coder)
For internal use only; no backwards-compatibility guarantees.
-
createPrimitiveOutputInternal
@Internal public static <T> PCollection<T> createPrimitiveOutputInternal(Pipeline pipeline, WindowingStrategy<?,?> windowingStrategy, PCollection.IsBounded isBounded, @Nullable Coder<T> coder, TupleTag<?> tag)
For internal use only; no backwards-compatibility guarantees.
-
-