K
- type of the keys mapping the elementsV
- type of the values being combined per keypublic abstract static class ApproximateDistinct.PerKeyDistinct<K,V>
extends org.apache.beam.sdk.transforms.PTransform<org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,V>>,org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,java.lang.Long>>>
ApproximateDistinct.perKey()
.Constructor and Description |
---|
PerKeyDistinct() |
Modifier and Type | Method and Description |
---|---|
org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,java.lang.Long>> |
expand(org.apache.beam.sdk.values.PCollection<org.apache.beam.sdk.values.KV<K,V>> input) |
ApproximateDistinct.PerKeyDistinct<K,V> |
withPrecision(int p)
Sets the precision
p . |
ApproximateDistinct.PerKeyDistinct<K,V> |
withSparsePrecision(int sp)
Sets the sparse representation's precision
sp . |
addAnnotation, compose, compose, getAdditionalInputs, getAnnotations, getDefaultOutputCoder, getDefaultOutputCoder, getDefaultOutputCoder, getKindString, getName, getResourceHints, populateDisplayData, setDisplayData, setResourceHints, toString, validate, validate
public ApproximateDistinct.PerKeyDistinct<K,V> withPrecision(int p)
p
.
Keep in mind that p
cannot be lower than 4, because the estimation would be too
inaccurate.
See ApproximateDistinct.precisionForRelativeError(double)
and ApproximateDistinct.relativeErrorForPrecision(int)
to have more information about the
relationship between precision and relative error.
p
- the precision value for the normal representationpublic ApproximateDistinct.PerKeyDistinct<K,V> withSparsePrecision(int sp)
sp
.
Values above 32 are not yet supported by the AddThis version of HyperLogLog+.
Fore more information about the sparse representation, read Google's paper available here.
sp
- the precision of HyperLogLog+' sparse representation