Package org.apache.beam.sdk.transforms
Class Sample
- java.lang.Object
-
- org.apache.beam.sdk.transforms.Sample
-
public class Sample extends java.lang.Object
PTransform
s for taking samples of the elements in aPCollection
, or samples of the values associated with each key in aPCollection
ofKV
s.fixedSizeGlobally(int)
andfixedSizePerKey(int)
compute uniformly random samples.any(long)
is faster, but provides no uniformity guarantees.combineFn(int)
can also be used manually, in combination with state and with theCombine
transform.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Sample.FixedSizedSampleFn<T>
CombineFn
that computes a fixed-size sample of a collection of values.
-
Constructor Summary
Constructors Constructor Description Sample()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static <T> PTransform<PCollection<T>,PCollection<T>>
any(long limit)
Sample#any(long)
takes aPCollection<T>
and a limit, and produces a newPCollection<T>
containing up to limit elements of the inputPCollection
.static <T> Combine.CombineFn<T,?,java.lang.Iterable<T>>
anyCombineFn(int sampleSize)
Returns aCombine.CombineFn
that computes a fixed-sized potentially non-uniform sample of its inputs.static <T> Combine.CombineFn<T,?,T>
anyValueCombineFn()
Returns aCombine.CombineFn
that computes a single and potentially non-uniform sample value of its inputs.static <T> Combine.CombineFn<T,?,java.lang.Iterable<T>>
combineFn(int sampleSize)
Returns aCombine.CombineFn
that computes a fixed-sized uniform sample of its inputs.static <T> PTransform<PCollection<T>,PCollection<java.lang.Iterable<T>>>
fixedSizeGlobally(int sampleSize)
Returns aPTransform
that takes aPCollection<T>
, selectssampleSize
elements, uniformly at random, and returns aPCollection<Iterable<T>>
containing the selected elements.static <K,V>
PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.lang.Iterable<V>>>>fixedSizePerKey(int sampleSize)
Returns aPTransform
that takes an inputPCollection<KV<K, V>>
and returns aPCollection<KV<K, Iterable<V>>>
that contains an output element mapping each distinct key in the inputPCollection
to a sample ofsampleSize
values associated with that key in the inputPCollection
, taken uniformly at random.
-
-
-
Method Detail
-
combineFn
public static <T> Combine.CombineFn<T,?,java.lang.Iterable<T>> combineFn(int sampleSize)
Returns aCombine.CombineFn
that computes a fixed-sized uniform sample of its inputs.
-
anyCombineFn
public static <T> Combine.CombineFn<T,?,java.lang.Iterable<T>> anyCombineFn(int sampleSize)
Returns aCombine.CombineFn
that computes a fixed-sized potentially non-uniform sample of its inputs.
-
anyValueCombineFn
public static <T> Combine.CombineFn<T,?,T> anyValueCombineFn()
Returns aCombine.CombineFn
that computes a single and potentially non-uniform sample value of its inputs.
-
any
public static <T> PTransform<PCollection<T>,PCollection<T>> any(long limit)
Sample#any(long)
takes aPCollection<T>
and a limit, and produces a newPCollection<T>
containing up to limit elements of the inputPCollection
.If limit is greater than or equal to the size of the input
PCollection
, then all the input's elements will be selected.Example of use:
PCollection<String> input = ...; PCollection<String> output = input.apply(Sample.<String>any(100));
- Type Parameters:
T
- the type of the elements of the input and outputPCollection
s- Parameters:
limit
- the number of elements to take from the input
-
fixedSizeGlobally
public static <T> PTransform<PCollection<T>,PCollection<java.lang.Iterable<T>>> fixedSizeGlobally(int sampleSize)
Returns aPTransform
that takes aPCollection<T>
, selectssampleSize
elements, uniformly at random, and returns aPCollection<Iterable<T>>
containing the selected elements. If the inputPCollection
has fewer thansampleSize
elements, then the outputIterable<T>
will be all the input's elements.All of the elements of the output
PCollection
should fit into main memory of a single worker machine. This operation does not run in parallel.Example of use:
PCollection<String> pc = ...; PCollection<Iterable<String>> sampleOfSize10 = pc.apply(Sample.fixedSizeGlobally(10));
- Type Parameters:
T
- the type of the elements- Parameters:
sampleSize
- the number of elements to select; must be>= 0
-
fixedSizePerKey
public static <K,V> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.lang.Iterable<V>>>> fixedSizePerKey(int sampleSize)
Returns aPTransform
that takes an inputPCollection<KV<K, V>>
and returns aPCollection<KV<K, Iterable<V>>>
that contains an output element mapping each distinct key in the inputPCollection
to a sample ofsampleSize
values associated with that key in the inputPCollection
, taken uniformly at random. If a key in the inputPCollection
has fewer thansampleSize
values associated with it, then the outputIterable<V>
associated with that key will be all the values associated with that key in the inputPCollection
.Example of use:
PCollection<KV<String, Integer>> pc = ...; PCollection<KV<String, Iterable<Integer>>> sampleOfSize10PerKey = pc.apply(Sample.<String, Integer>fixedSizePerKey());
- Type Parameters:
K
- the type of the keysV
- the type of the values- Parameters:
sampleSize
- the number of values to select for each distinct key; must be>= 0
-
-