Class ApproximateQuantiles
- java.lang.Object
-
- org.apache.beam.sdk.transforms.ApproximateQuantiles
-
public class ApproximateQuantiles extends java.lang.Object
PTransform
s for getting an idea of aPCollection
's data distribution using approximateN
-tiles (e.g. quartiles, percentiles, etc.), either globally or per-key.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
TheApproximateQuantilesCombineFn
combiner gives an idea of the distribution of a collection of values using approximateN
-tiles.
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method Description static <T extends java.lang.Comparable<T>>
PTransform<PCollection<T>,PCollection<java.util.List<T>>>globally(int numQuantiles)
Likeglobally(int, Comparator)
, but sorts using the elements' natural ordering.static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
PTransform<PCollection<T>,PCollection<java.util.List<T>>>globally(int numQuantiles, ComparatorT compareFn)
Returns aPTransform
that takes aPCollection<T>
and returns aPCollection<List<T>>
whose single value is aList
of the approximateN
-tiles of the elements of the inputPCollection
.static <K,V extends java.lang.Comparable<V>>
PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>>perKey(int numQuantiles)
LikeperKey(int, Comparator)
, but sorts values using their natural ordering.static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable>
PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>>perKey(int numQuantiles, ComparatorT compareFn)
Returns aPTransform
that takes aPCollection<KV<K, V>>
and returns aPCollection<KV<K, List<V>>>
that contains an output element mapping each distinct key in the inputPCollection
to aList
of the approximateN
-tiles of the values associated with that key in the inputPCollection
.
-
-
-
Method Detail
-
globally
public static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> PTransform<PCollection<T>,PCollection<java.util.List<T>>> globally(int numQuantiles, ComparatorT compareFn)
Returns aPTransform
that takes aPCollection<T>
and returns aPCollection<List<T>>
whose single value is aList
of the approximateN
-tiles of the elements of the inputPCollection
. This gives an idea of the distribution of the input elements.The computed
List
is of sizenumQuantiles
, and contains the input elements' minimum value,numQuantiles-2
intermediate values, and maximum value, in sorted order, using the givenComparator
to order values. To compute traditionalN
-tiles, one should useApproximateQuantiles.globally(N+1, compareFn)
.If there are fewer input elements than
numQuantiles
, then the resultList
will contain all the input elements, in sorted order.The argument
Comparator
must beSerializable
.Example of use:
PCollection<String> pc = ...; PCollection<List<String>> quantiles = pc.apply(ApproximateQuantiles.globally(11, stringCompareFn));
- Type Parameters:
T
- the type of the elements in the inputPCollection
- Parameters:
numQuantiles
- the number of elements in the resulting quantile valuesList
compareFn
- the function to use to order the elements
-
globally
public static <T extends java.lang.Comparable<T>> PTransform<PCollection<T>,PCollection<java.util.List<T>>> globally(int numQuantiles)
Likeglobally(int, Comparator)
, but sorts using the elements' natural ordering.- Type Parameters:
T
- the type of the elements in the inputPCollection
- Parameters:
numQuantiles
- the number of elements in the resulting quantile valuesList
-
perKey
public static <K,V,ComparatorT extends java.util.Comparator<V> & java.io.Serializable> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int numQuantiles, ComparatorT compareFn)
Returns aPTransform
that takes aPCollection<KV<K, V>>
and returns aPCollection<KV<K, List<V>>>
that contains an output element mapping each distinct key in the inputPCollection
to aList
of the approximateN
-tiles of the values associated with that key in the inputPCollection
. This gives an idea of the distribution of the input values for each key.Each of the computed
List
s is of sizenumQuantiles
, and contains the input values' minimum value,numQuantiles-2
intermediate values, and maximum value, in sorted order, using the givenComparator
to order values. To compute traditionalN
-tiles, one should useApproximateQuantiles.perKey(compareFn, N+1)
.If a key has fewer than
numQuantiles
values associated with it, then that key's outputList
will contain all the key's input values, in sorted order.The argument
Comparator
must beSerializable
.Example of use:
PCollection<KV<Integer, String>> pc = ...; PCollection<KV<Integer, List<String>>> quantilesPerKey = pc.apply(ApproximateQuantiles.<Integer, String>perKey(stringCompareFn, 11));
See
Combine.PerKey
for how this affects timestamps and windowing.- Type Parameters:
K
- the type of the keys in the input and outputPCollection
sV
- the type of the values in the inputPCollection
- Parameters:
numQuantiles
- the number of elements in the resulting quantile valuesList
compareFn
- the function to use to order the elements
-
perKey
public static <K,V extends java.lang.Comparable<V>> PTransform<PCollection<KV<K,V>>,PCollection<KV<K,java.util.List<V>>>> perKey(int numQuantiles)
LikeperKey(int, Comparator)
, but sorts values using their natural ordering.- Type Parameters:
K
- the type of the keys in the input and outputPCollection
sV
- the type of the values in the inputPCollection
- Parameters:
numQuantiles
- the number of elements in the resulting quantile valuesList
-
-