Class ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
- java.lang.Object
-
- org.apache.beam.sdk.transforms.Combine.CombineFn<InputT,AccumT,OutputT>
-
- org.apache.beam.sdk.transforms.Combine.AccumulatingCombineFn<T,org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT>,java.util.List<T>>
-
- org.apache.beam.sdk.transforms.ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT>
-
- Type Parameters:
T
- the type of the values being combined
- All Implemented Interfaces:
java.io.Serializable
,CombineFnBase.GlobalCombineFn<T,org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT>,java.util.List<T>>
,HasDisplayData
- Enclosing class:
- ApproximateQuantiles
public static class ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> extends Combine.AccumulatingCombineFn<T,org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT>,java.util.List<T>>
TheApproximateQuantilesCombineFn
combiner gives an idea of the distribution of a collection of values using approximateN
-tiles. The output of this combiner is aList
of sizenumQuantiles
, containing the input values' minimum value,numQuantiles-2
intermediate values, and maximum value, in sorted order, so for traditionalN
-tiles, one should useApproximateQuantilesCombineFn#create(N+1)
.If there are fewer values to combine than
numQuantiles
, then the resultList
will contain all the values being combined, in sorted order.Values are ordered using either a specified
Comparator
or the values' natural ordering.To evaluate the quantiles we use the "New Algorithm" described here:
[MRL98] Manku, Rajagopalan & Lindsay, "Approximate Medians and other Quantiles in One Pass and with Limited Memory", Proc. 1998 ACM SIGMOD, Vol 27, No 2, p 426-435, June 1998. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.6.6513&rep=rep1&type=pdf
The default error bound is
1 / N
, though in practice the accuracy tends to be much better.See
create(int, Comparator, long, double)
for more information about the meaning ofepsilon
, andwithEpsilon(double)
for a convenient way to adjust it.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.beam.sdk.transforms.Combine.AccumulatingCombineFn
Combine.AccumulatingCombineFn.Accumulator<InputT,AccumT,OutputT>
-
-
Field Summary
Fields Modifier and Type Field Description static long
DEFAULT_MAX_NUM_ELEMENTS
The cost (in time and space) to compute quantiles to a given accuracy is a function of the total number of elements in the data set.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static <T extends java.lang.Comparable<T>>
ApproximateQuantiles.ApproximateQuantilesCombineFn<T,Top.Natural<T>>create(int numQuantiles)
Likecreate(int, Comparator)
, but sorts values using their natural ordering.static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT>create(int numQuantiles, ComparatorT compareFn)
Returns an approximate quantiles combiner with the givencompareFn
and desired number of quantiles.static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT>create(int numQuantiles, ComparatorT compareFn, long maxNumElements, double epsilon)
Creates an approximate quantiles combiner with the givencompareFn
and desired number of quantiles.org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT>
createAccumulator()
Returns a new, mutable accumulator value, representing the accumulation of zero input values.java.lang.reflect.TypeVariable<?>
getAccumTVariable()
Returns theTypeVariable
ofAccumT
.Coder<org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT>>
getAccumulatorCoder(CoderRegistry registry, Coder<T> elementCoder)
Returns theCoder
to use for accumulatorAccumT
values, or null if it is not able to be inferred.Coder<OutputT>
getDefaultOutputCoder(CoderRegistry registry, Coder<InputT> inputCoder)
Returns theCoder
to use by default for outputOutputT
values, or null if it is not able to be inferred.java.lang.String
getIncompatibleGlobalWindowErrorMessage()
Returns the error message for not supported default values in Combine.globally().java.lang.reflect.TypeVariable<?>
getInputTVariable()
Returns theTypeVariable
ofInputT
.java.lang.reflect.TypeVariable<?>
getOutputTVariable()
Returns theTypeVariable
ofOutputT
.void
populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT>
withEpsilon(double epsilon)
Returns anApproximateQuantilesCombineFn
that's like this one except that it uses the specifiedepsilon
value.ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT>
withMaxInputSize(long maxNumElements)
Returns anApproximateQuantilesCombineFn
that's like this one except that it uses the specifiedmaxNumElements
value.-
Methods inherited from class org.apache.beam.sdk.transforms.Combine.AccumulatingCombineFn
addInput, extractOutput, mergeAccumulators
-
Methods inherited from class org.apache.beam.sdk.transforms.Combine.CombineFn
apply, compact, defaultValue, getInputType, getOutputType
-
-
-
-
Field Detail
-
DEFAULT_MAX_NUM_ELEMENTS
public static final long DEFAULT_MAX_NUM_ELEMENTS
The cost (in time and space) to compute quantiles to a given accuracy is a function of the total number of elements in the data set. If an estimate is not known or specified, we use this as an upper bound. If this is too low, errors may exceed the requested tolerance; if too high, efficiency may be non-optimal. The impact is logarithmic with respect to this value, so this default should be fine for most uses.- See Also:
- Constant Field Values
-
-
Method Detail
-
create
public static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT> create(int numQuantiles, ComparatorT compareFn)
Returns an approximate quantiles combiner with the givencompareFn
and desired number of quantiles. A total ofnumQuantiles
elements will appear in the output list, including the minimum and maximum.The
Comparator
must beSerializable
.The default error bound is
1 / numQuantiles
, which holds as long as the number of elements is less thanDEFAULT_MAX_NUM_ELEMENTS
.
-
create
public static <T extends java.lang.Comparable<T>> ApproximateQuantiles.ApproximateQuantilesCombineFn<T,Top.Natural<T>> create(int numQuantiles)
Likecreate(int, Comparator)
, but sorts values using their natural ordering.
-
withEpsilon
public ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT> withEpsilon(double epsilon)
Returns anApproximateQuantilesCombineFn
that's like this one except that it uses the specifiedepsilon
value. Does not modify this combiner.See
create(int, Comparator, long, double)
for more information about the meaning ofepsilon
.
-
withMaxInputSize
public ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT> withMaxInputSize(long maxNumElements)
Returns anApproximateQuantilesCombineFn
that's like this one except that it uses the specifiedmaxNumElements
value. Does not modify this combiner.See
create(int, Comparator, long, double)
for more information about the meaning ofmaxNumElements
.
-
create
public static <T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable> ApproximateQuantiles.ApproximateQuantilesCombineFn<T,ComparatorT> create(int numQuantiles, ComparatorT compareFn, long maxNumElements, double epsilon)
Creates an approximate quantiles combiner with the givencompareFn
and desired number of quantiles. A total ofnumQuantiles
elements will appear in the output list, including the minimum and maximum.The
Comparator
must beSerializable
.The default error bound is
epsilon
, which holds as long as the number of elements is less thanmaxNumElements
. Specifically, if one considers the input as a sorted list x_1, ..., x_N, then the distance between the each exact quantile x_c and its approximation x_c' is bounded by|c - c'| < epsilon * N
. Note that these errors are worst-case scenarios; in practice the accuracy tends to be much better.
-
createAccumulator
public org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT> createAccumulator()
Description copied from class:Combine.CombineFn
Returns a new, mutable accumulator value, representing the accumulation of zero input values.- Specified by:
createAccumulator
in classCombine.CombineFn<T,org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>,java.util.List<T>>
-
getAccumulatorCoder
public Coder<org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT>> getAccumulatorCoder(CoderRegistry registry, Coder<T> elementCoder)
Description copied from interface:CombineFnBase.GlobalCombineFn
Returns theCoder
to use for accumulatorAccumT
values, or null if it is not able to be inferred.By default, uses the knowledge of the
Coder
being used forInputT
values and the enclosingPipeline
'sCoderRegistry
to try to infer the Coder forAccumT
values.This is the Coder used to send data through a communication-intensive shuffle step, so a compact and efficient representation may have significant performance benefits.
- Specified by:
getAccumulatorCoder
in interfaceCombineFnBase.GlobalCombineFn<T,org.apache.beam.sdk.transforms.ApproximateQuantiles.QuantileState<T,ComparatorT extends java.util.Comparator<T> & java.io.Serializable>,java.util.List<T>>
-
populateDisplayData
public void populateDisplayData(DisplayData.Builder builder)
Register display data for the given transform or component.populateDisplayData(DisplayData.Builder)
is invoked by Pipeline runners to collect display data viaDisplayData.from(HasDisplayData)
. Implementations may callsuper.populateDisplayData(builder)
in order to register display data in the current namespace, but should otherwise usesubcomponent.populateDisplayData(builder)
to use the namespace of the subcomponent.By default, does not register any display data. Implementors may override this method to provide their own display data.
- Specified by:
populateDisplayData
in interfaceHasDisplayData
- Parameters:
builder
- The builder to populate with display data.- See Also:
HasDisplayData
-
getDefaultOutputCoder
public Coder<OutputT> getDefaultOutputCoder(CoderRegistry registry, Coder<InputT> inputCoder) throws CannotProvideCoderException
Description copied from interface:CombineFnBase.GlobalCombineFn
Returns theCoder
to use by default for outputOutputT
values, or null if it is not able to be inferred.By default, uses the knowledge of the
Coder
being used for inputInputT
values and the enclosingPipeline
'sCoderRegistry
to try to infer the Coder forOutputT
values.- Specified by:
getDefaultOutputCoder
in interfaceCombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>
- Throws:
CannotProvideCoderException
-
getIncompatibleGlobalWindowErrorMessage
public java.lang.String getIncompatibleGlobalWindowErrorMessage()
Description copied from interface:CombineFnBase.GlobalCombineFn
Returns the error message for not supported default values in Combine.globally().- Specified by:
getIncompatibleGlobalWindowErrorMessage
in interfaceCombineFnBase.GlobalCombineFn<InputT,AccumT,OutputT>
-
getInputTVariable
public java.lang.reflect.TypeVariable<?> getInputTVariable()
Returns theTypeVariable
ofInputT
.
-
getAccumTVariable
public java.lang.reflect.TypeVariable<?> getAccumTVariable()
Returns theTypeVariable
ofAccumT
.
-
getOutputTVariable
public java.lang.reflect.TypeVariable<?> getOutputTVariable()
Returns theTypeVariable
ofOutputT
.
-
-