public class Window
extends java.lang.Object
Window logically divides up or groups the elements of a
PCollection into finite windows according to a WindowFn.
The output of Window contains the same elements as input, but they
have been logically assigned to windows. The next
GroupByKeys, including one
within composite transforms, will group by the combination of keys and
windows.
See GroupByKey
for more information about how grouping with windows works.
Windowing a PCollection allows chunks of it to be processed
individually, before the entire PCollection is available. This is
especially important for PCollections with unbounded size,
since the full PCollection is
never available at once, since more data is continually arriving.
For PCollections with a bounded size (aka. conventional batch mode),
by default, all data is implicitly in a single window, unless
Window is applied.
For example, a simple form of windowing divides up the data into
fixed-width time intervals, using FixedWindows.
The following example demonstrates how to use Window in a pipeline
that counts the number of occurrences of strings each minute:
PCollection<String> items = ...;
PCollection<String> windowed_items = item.apply(
Window.<String>into(FixedWindows.of(1, TimeUnit.MINUTES)));
PCollection<KV<String, Long>> windowed_counts = windowed_items.apply(
Count.<String>perElement());
Let (data, timestamp) denote a data element along with its timestamp. Then, if the input to this pipeline consists of {("foo", 15s), ("bar", 30s), ("foo", 45s), ("foo", 1m30s)}, the output will be {(KV("foo", 2), 1m), (KV("bar", 1), 1m), (KV("foo", 1), 2m)}
Several predefined WindowFns are provided:
FixedWindows partitions the timestamps into fixed-width intervals.
SlidingWindows places data into overlapping fixed-width intervals.
Sessions groups data into sessions where each item in a window
is separated from the next by no more than a specified gap.
WindowFns can be created, by creating new
subclasses of WindowFn.| Modifier and Type | Class and Description |
|---|---|
static class |
Window.Bound<T>
A
PTransform that windows the elements of a PCollection<T>,
into finite windows according to a user-specified WindowFn<T, B>. |
static class |
Window.Remerge<T>
PTransform that does not change assigned windows, but will cause
windows to be merged again as part of the next
GroupByKey. |
static class |
Window.Unbound
An incomplete
Window transform, with unbound input/output type. |
| Constructor and Description |
|---|
Window() |
| Modifier and Type | Method and Description |
|---|---|
static <T> Window.Bound<T> |
into(WindowFn<? super T,?> fn)
|
static Window.Unbound |
named(java.lang.String name)
Creates a
Window PTransform with the given name. |
static <T> Window.Remerge<T> |
remerge()
Creates a
Window PTransform that does not change assigned
windows, but will cause windows to be merged again as part of the next
GroupByKey. |
public static Window.Unbound named(java.lang.String name)
Window PTransform with the given name.
See the discussion of Naming in
ParDo for more explanation.
The resulting PTransform is incomplete, and its input/output
type is not yet bound. Use Window.Unbound.into(com.google.cloud.dataflow.sdk.transforms.windowing.WindowFn<? super T, ?>) to specify the
WindowFn to use, which will also bind the input/output type of this
PTransform.
public static <T> Window.Bound<T> into(WindowFn<? super T,?> fn)
Window PTransform that uses the given
WindowFn to window the data.
The resulting PTransform's types have been bound, with both the
input and output being a PCollection<T>, inferred from the types of
the argument WindowFn<T, B>. It is ready to be applied, or further
properties can be set on it first.
public static <T> Window.Remerge<T> remerge()
Window PTransform that does not change assigned
windows, but will cause windows to be merged again as part of the next
GroupByKey.