public class Sample
extends java.lang.Object
PTransforms for taking samples of the elements in a
PCollection, or samples of the values associated with each
key in a PCollection of KVs.| Modifier and Type | Class and Description |
|---|---|
static class |
Sample.FixedSizedSampleFn<T>
CombineFn that computes a fixed-size sample of a
collection of values. |
| Constructor and Description |
|---|
Sample() |
| Modifier and Type | Method and Description |
|---|---|
static <T> PTransform<PCollection<T>,PCollection<java.lang.Iterable<T>>> |
fixedSizeGlobally(int sampleSize)
Returns a
PTransform that takes a PCollection<T>,
selects sampleSize elements, uniformly at random, and returns a
PCollection<Iterable<T>> containing the selected elements. |
static <K,V> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.lang.Iterable<V>>>> |
fixedSizePerKey(int sampleSize)
Returns a
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, Iterable<V>>> that contains an output
element mapping each distinct key in the input
PCollection to a sample of sampleSize values
associated with that key in the input PCollection, taken
uniformly at random. |
public static <T> PTransform<PCollection<T>,PCollection<java.lang.Iterable<T>>> fixedSizeGlobally(int sampleSize)
PTransform that takes a PCollection<T>,
selects sampleSize elements, uniformly at random, and returns a
PCollection<Iterable<T>> containing the selected elements.
If the input PCollection has fewer than
sampleSize elements, then the output Iterable<T>
will be all the input's elements.
Example of use:
PCollection<String> pc = ...;
PCollection<Iterable<String>> sampleOfSize10 =
pc.apply(Sample.fixedSizeGlobally(10));
T - the type of the elementssampleSize - the number of elements to select; must be >= 0public static <K,V> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.lang.Iterable<V>>>> fixedSizePerKey(int sampleSize)
PTransform that takes an input
PCollection<KV<K, V>> and returns a
PCollection<KV<K, Iterable<V>>> that contains an output
element mapping each distinct key in the input
PCollection to a sample of sampleSize values
associated with that key in the input PCollection, taken
uniformly at random. If a key in the input PCollection
has fewer than sampleSize values associated with it, then
the output Iterable<V> associated with that key will be
all the values associated with that key in the input
PCollection.
Example of use:
PCollection<KV<String, Integer>> pc = ...;
PCollection<KV<String, Iterable<Integer>>> sampleOfSize10PerKey =
pc.apply(Sample.<String, Integer>fixedSizePerKey());
K - the type of the keysV - the type of the valuessampleSize - the number of values to select for each
distinct key; must be >= 0