T - the type of the elements of this PCollectionpublic class PCollection<T> extends TypedPValue<T>
PCollection<T> is an immutable collection of values of type
T. A PCollection can contain either a bounded or unbounded
number of elements. Bounded and unbounded PCollections are produced
as the output of PTransforms
(including root PTransforms like
TextIO.Read,
PubsubIO.Read and
Create), and can
be passed as the inputs of other PTransforms.
Some root transforms produce bounded PCollections and others
produce unbounded ones. For example,
TextIO.Read reads a static set
of files, so it produces a bounded PCollection.
PubsubIO.Read, on the other hand,
receives a potentially infinite stream of Pubsub messages, so it produces
an unbounded PCollection.
Each element in a PCollection may have an associated implicit
timestamp. Readers assign timestamps to elements when they create
PCollections, and other PTransforms propagate these
timestamps from their input to their output. For example, PubsubIO.Read
assigns pubsub message timestamps to elements, and TextIO.Read assigns
the default value Long.MIN_VALUE to elements. User code can
explicitly assign timestamps to elements with
DoFn.Context.outputWithTimestamp(O, org.joda.time.Instant).
Additionally, a PCollection has an associated
WindowFn and each element is assigned to a set of windows.
By default, the windowing function is
GlobalWindows
and all elements are assigned into a single default window.
This default can be overridden with the
Window
PTransform. Dataflow pipelines run in classic batch MapReduce style
with the default GlobalWindow strategy if timestamps are ignored.
See the individual PTransform subclasses for specific information
on how they propagate timestamps and windowing.
| Modifier and Type | Method and Description |
|---|---|
<Output extends POutput> |
apply(PTransform<? super PCollection<T>,Output> t)
Applies the given PTransform to this input PCollection, and
returns the PTransform's Output.
|
static <T> PCollection<T> |
createPrimitiveOutputInternal(WindowFn<?,?> windowFn)
Creates and returns a new PCollection for a primitive output.
|
Coder<T> |
getCoder()
Returns the Coder used by this PCollection to encode and decode
the values stored in it.
|
java.lang.String |
getName()
Returns the name of this PCollection.
|
WindowFn<?,?> |
getWindowFn()
Returns the
WindowFn of this PCollection. |
PCollection<T> |
setCoder(Coder<T> coder)
Sets the Coder used by this PCollection to encode and decode the
values stored in it.
|
PCollection<T> |
setName(java.lang.String name)
Sets the name of this PCollection.
|
PCollection<T> |
setPipelineInternal(Pipeline pipeline)
Sets the
Pipeline for this PCollection. |
PCollection<T> |
setTypeTokenInternal(com.google.common.reflect.TypeToken<T> typeToken)
Sets the
TypeToken<T> for this PCollection<T>, so that
the enclosing PCollectionTuple, PCollectionList<T>,
or PTransform<?, PCollection<T>>, etc., can provide
more detailed reflective information. |
PCollection<T> |
setWindowFnInternal(WindowFn<?,?> windowFn)
Sets the
WindowFn of this PCollection. |
finishSpecifying, getTypeToken, recordAsOutputexpand, getKindString, getPipeline, isFinishedSpecifyingInternal, recordAsOutput, toStringfinishSpecifyingOutput, getProducingTransformInternalclone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitgetProducingTransformInternalexpand, finishSpecifyingOutput, recordAsOutputexpand, getPipelinepublic java.lang.String getName()
By default, the name of a PCollection is based on the name of the
PTransform that produces it. It can be specified explicitly by
calling setName(java.lang.String).
getName in interface PValuegetName in class PValueBasejava.lang.IllegalStateException - if the name hasn't been set yetpublic PCollection<T> setName(java.lang.String name)
this.setName in class PValueBasejava.lang.IllegalStateException - if this PCollection has already been
finalized and is no longer settable, e.g., by having
apply() called on itpublic Coder<T> getCoder()
getCoder in class TypedPValue<T>java.lang.IllegalStateException - if the Coder hasn't been set, and
couldn't be inferredpublic PCollection<T> setCoder(Coder<T> coder)
this.setCoder in class TypedPValue<T>java.lang.IllegalStateException - if this PCollection has already
been finalized and is no longer settable, e.g., by having
apply() called on itpublic <Output extends POutput> Output apply(PTransform<? super PCollection<T>,Output> t)
public PCollection<T> setTypeTokenInternal(com.google.common.reflect.TypeToken<T> typeToken)
TypeToken<T> for this PCollection<T>, so that
the enclosing PCollectionTuple, PCollectionList<T>,
or PTransform<?, PCollection<T>>, etc., can provide
more detailed reflective information.setTypeTokenInternal in class TypedPValue<T>public PCollection<T> setWindowFnInternal(WindowFn<?,?> windowFn)
public PCollection<T> setPipelineInternal(Pipeline pipeline)
setPipelineInternal in interface PValuesetPipelineInternal in class TypedPValue<T>public static <T> PCollection<T> createPrimitiveOutputInternal(WindowFn<?,?> windowFn)
For use by primitive transformations only.