public class View extends Object
PCollectionViews from PCollections,
for consuming the contents of those PCollections as side inputs
to ParDo transforms. These transforms support viewing a PCollection
as a single value, an iterable, a map, or a multimap.
For a PCollection that contains a single value of type T
per window, such as the output of Combine.globally(com.google.cloud.dataflow.sdk.transforms.SerializableFunction<java.lang.Iterable<V>, V>),
use asSingleton() to prepare it for use as a side input:
PCollectionView<T> output = someOtherPCollection
.apply(Combine.globally(...))
.apply(View.asSingleton());
For a small PCollection that can fit entirely in memory,
use asList() to prepare it for use as a List.
When read as a side input, the entire list will be cached in memory.
PCollectionView<List<T>> output =
smallPCollection.apply(View.asList());
If a PCollection of KV<K, V> is known to
have a single value for each key, then use asMap()
to view it as a Map<K, V>:
PCollectionView<Map<K, V> output =
somePCollection.apply(View.asMap());
Otherwise, to access a PCollection of KV<K, V> as a
Map<K, Iterable<V>> side input, use asMultimap():
PCollectionView<Map<K, Iterable<V>> output =
somePCollection.apply(View.asMap());
To iterate over an entire window of a PCollection via
side input, use asIterable():
PCollectionView<Iterable<T>> output =
somePCollection.apply(View.asIterable());
Both asMultimap() and asMap() are useful
for implementing lookup based "joins" with the main input, when the
side input is small enough to fit into memory.
For example, if you represent a page on a website via some Page object and
have some type UrlVisits logging that a URL was visited, you could convert these
to more fully structured PageVisit objects using a side input, something like the
following:
PCollection<Page> pages = ... // pages fit into memory
PCollection<UrlVisit> urlVisits = ... // very large collection
final PCollectionView<Map<URL, Page>> = urlToPage
.apply(WithKeys.of( ... )) // extract the URL from the page
.apply(View.asMap());
PCollection PageVisits = urlVisits
.apply(ParDo.withSideInputs(urlToPage)
.of(new DoFn<UrlVisit, PageVisit>() {
{@literal @}Override
void processElement(ProcessContext context) {
UrlVisit urlVisit = context.element();
Page page = urlToPage.get(urlVisit.getUrl());
c.output(new PageVisit(page, urlVisit.getVisitData()));
}
}));
See ParDo.withSideInputs(com.google.cloud.dataflow.sdk.values.PCollectionView<?>...) for details on how to access
this variable inside a ParDo over another PCollection.
| Modifier and Type | Class and Description |
|---|---|
static class |
View.AsIterable<T>
A
PTransform that produces a PCollectionView of a singleton
PCollection yielding the single element it contains. |
static class |
View.AsMap<K,V>
A
PTransform that produces a PCollectionView of a keyed PCollection
yielding a map from each key to its unique associated value. |
static class |
View.AsMultimap<K,V>
A
PTransform that produces a PCollectionView of a keyed PCollection
yielding a map of keys to all associated values. |
static class |
View.AsSingleton<T>
A
PTransform that produces a PCollectionView of a singleton
PCollection yielding the single element it contains. |
static class |
View.Concatenate<T>
Combiner that combines
Ts into a single List<T> containing
all inputs. |
static class |
View.CreatePCollectionView<ElemT,ViewT>
Creates a primitive
PCollectionView. |
| Modifier and Type | Method and Description |
|---|---|
static <T> View.AsIterable<T> |
asIterable()
Returns a
View.AsIterable that takes a
PCollection as input and produces a PCollectionView
of the values, to be consumed as an iterable side input. |
static <T> PTransform<PCollection<T>,PCollectionView<List<T>>> |
asList()
Returns a transform that takes a
PCollection and returns a
List containing all of its elements, to be consumed as
a side input. |
static <K,V> View.AsMap<K,V> |
asMap()
Returns an
View.AsMap transform that takes a PCollection as input
and produces a PCollectionView of the values to be consumed
as a Map<K, V> side input. |
static <K,V> View.AsMultimap<K,V> |
asMultimap()
Returns an
View.AsMultimap transform that takes a PCollection
of KV<K, V> pairs as input and produces a PCollectionView of
its contents as a Map<K, Iterable<V>> for use as a side input. |
static <T> View.AsSingleton<T> |
asSingleton()
Returns a
View.AsSingleton transform that takes a singleton
PCollection as input and produces a PCollectionView
of the single value, to be consumed as a side input. |
public static <T> View.AsSingleton<T> asSingleton()
View.AsSingleton transform that takes a singleton
PCollection as input and produces a PCollectionView
of the single value, to be consumed as a side input.
PCollection<InputT> input = ...
CombineFn<InputT, OutputT> yourCombineFn = ...
PCollectionView<OutputT> output = input
.apply(Combine.globally(yourCombineFn))
.apply(View.asSingleton());
If the input PCollection is empty,
throws NoSuchElementException in the consuming
DoFn.
If the input PCollection contains more than one
element, throws IllegalArgumentException in the
consuming DoFn.
public static <T> PTransform<PCollection<T>,PCollectionView<List<T>>> asList()
PCollection and returns a
List containing all of its elements, to be consumed as
a side input.
The resulting list is required to fit in memory.
public static <T> View.AsIterable<T> asIterable()
View.AsIterable that takes a
PCollection as input and produces a PCollectionView
of the values, to be consumed as an iterable side input. The values of
this Iterable may not be cached; if that behavior is desired, use
asList().public static <K,V> View.AsMap<K,V> asMap()
View.AsMap transform that takes a PCollection as input
and produces a PCollectionView of the values to be consumed
as a Map<K, V> side input. It is required that each key of the input be
associated with a single value. If this is not the case, precede this
view with Combine.perKey, as below, or alternatively use asMultimap().
PCollection<KV<K, V>> input = ...
CombineFn<V, OutputT> yourCombineFn = ...
PCollectionView<Map<K, OutputT>> output = input
.apply(Combine.perKey(yourCombineFn.<K>asKeyedFn()))
.apply(View.asMap());
Currently, the resulting map is required to fit into memory.
public static <K,V> View.AsMultimap<K,V> asMultimap()
View.AsMultimap transform that takes a PCollection
of KV<K, V> pairs as input and produces a PCollectionView of
its contents as a Map<K, Iterable<V>> for use as a side input.
In contrast to asMap(), it is not required that the keys in the
input collection be unique.
PCollection<KV<K, V>> input = ... // maybe more than one occurrence of a some keys
PCollectionView<Map<K, V>> output = input.apply(View.asMultimap());
Currently, the resulting map is required to fit into memory.