Package org.apache.druid.hll
Class HyperLogLogCollector
- java.lang.Object
-
- org.apache.druid.hll.HyperLogLogCollector
-
- All Implemented Interfaces:
Comparable<HyperLogLogCollector>
- Direct Known Subclasses:
VersionOneHyperLogLogCollector
,VersionZeroHyperLogLogCollector
public abstract class HyperLogLogCollector extends Object implements Comparable<HyperLogLogCollector>
Implements the HyperLogLog cardinality estimator described in: http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf Run this code to see a simple indication of expected errors based on different m values:for (int i = 1; i < 20; ++i) { System.out.printf("i[%,d], val[%,d] => error[%f%%]%n", i, 2 << i, 104 / Math.sqrt(2 << i)); }
This class is *not* multi-threaded. It can be passed among threads, but it is written with the assumption that only one thread is ever calling methods on it. If you have multiple threads calling methods on this concurrently, I hope you manage to get correct behavior. Note that despite the non-thread-safety of this class, it is actually currently used by multiple threads during realtime indexing. HyperUniquesAggregator's "aggregate" and "get" methods can be called simultaneously by OnheapIncrementalIndex, since its "doAggregate" and "getMetricObjectValue" methods are not synchronized. So, watch out for that.
-
-
Field Summary
Fields Modifier and Type Field Description static int
BITS_FOR_BUCKETS
static double
CORRECTION_PARAMETER
static int
DENSE_THRESHOLD
static double
HIGH_CORRECTION_THRESHOLD
static double
LOW_CORRECTION_THRESHOLD
static int
NUM_BUCKETS
static int
NUM_BYTES_FOR_BUCKETS
-
Constructor Summary
Constructors Constructor Description HyperLogLogCollector(ByteBuffer byteBuffer)
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description void
add(byte[] hashedValue)
void
add(short bucket, byte positionOf1)
static double
applyCorrection(double e, int zeroCount)
int
compareTo(HyperLogLogCollector other)
boolean
equals(Object o)
static double
estimateByteBuffer(ByteBuffer buf)
double
estimateCardinality()
long
estimateCardinalityRound()
HyperLogLogCollector
fold(ByteBuffer buffer)
HyperLogLogCollector
fold(HyperLogLogCollector other)
protected int
getInitPosition()
static int
getLatestNumBytesForDenseStorage()
abstract short
getMaxOverflowRegister()
abstract byte
getMaxOverflowValue()
abstract int
getNumBytesForDenseStorage()
abstract int
getNumHeaderBytes()
abstract short
getNumNonZeroRegisters()
abstract int
getPayloadBytePosition()
abstract int
getPayloadBytePosition(ByteBuffer buffer)
abstract byte
getRegisterOffset()
protected ByteBuffer
getStorageBuffer()
abstract byte
getVersion()
int
hashCode()
static HyperLogLogCollector
makeCollector(ByteBuffer buffer)
Create a wrapper object around an HLL sketch contained within a buffer.static HyperLogLogCollector
makeCollectorSharingStorage(HyperLogLogCollector otherCollector)
Creates new collector which shares others collector buffer (by usingByteBuffer.duplicate()
)static byte[]
makeEmptyVersionedByteArray()
static HyperLogLogCollector
makeLatestCollector()
abstract void
setMaxOverflowRegister(short register)
abstract void
setMaxOverflowRegister(ByteBuffer buffer, short register)
abstract void
setMaxOverflowValue(byte value)
abstract void
setMaxOverflowValue(ByteBuffer buffer, byte value)
abstract void
setNumNonZeroRegisters(short numNonZeroRegisters)
abstract void
setNumNonZeroRegisters(ByteBuffer buffer, short numNonZeroRegisters)
abstract void
setRegisterOffset(byte registerOffset)
abstract void
setRegisterOffset(ByteBuffer buffer, byte registerOffset)
abstract void
setVersion(ByteBuffer buffer)
byte[]
toByteArray()
ByteBuffer
toByteBuffer()
String
toString()
-
-
-
Field Detail
-
DENSE_THRESHOLD
public static final int DENSE_THRESHOLD
- See Also:
- Constant Field Values
-
BITS_FOR_BUCKETS
public static final int BITS_FOR_BUCKETS
- See Also:
- Constant Field Values
-
NUM_BUCKETS
public static final int NUM_BUCKETS
- See Also:
- Constant Field Values
-
NUM_BYTES_FOR_BUCKETS
public static final int NUM_BYTES_FOR_BUCKETS
- See Also:
- Constant Field Values
-
LOW_CORRECTION_THRESHOLD
public static final double LOW_CORRECTION_THRESHOLD
- See Also:
- Constant Field Values
-
HIGH_CORRECTION_THRESHOLD
public static final double HIGH_CORRECTION_THRESHOLD
-
CORRECTION_PARAMETER
public static final double CORRECTION_PARAMETER
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
HyperLogLogCollector
public HyperLogLogCollector(ByteBuffer byteBuffer)
-
-
Method Detail
-
makeLatestCollector
public static HyperLogLogCollector makeLatestCollector()
-
makeCollector
public static HyperLogLogCollector makeCollector(ByteBuffer buffer)
Create a wrapper object around an HLL sketch contained within a buffer. The position and limit of the buffer may be changed; if you do not want this to happen, you can duplicate the buffer before passing it in. The mark and byte order of the buffer will not be modified.- Parameters:
buffer
- buffer containing an HLL sketch starting at its position and ending at its limit- Returns:
- HLLC wrapper object
-
makeCollectorSharingStorage
public static HyperLogLogCollector makeCollectorSharingStorage(HyperLogLogCollector otherCollector)
Creates new collector which shares others collector buffer (by usingByteBuffer.duplicate()
)- Parameters:
otherCollector
- collector which buffer will be shared- Returns:
- collector
-
getLatestNumBytesForDenseStorage
public static int getLatestNumBytesForDenseStorage()
-
makeEmptyVersionedByteArray
public static byte[] makeEmptyVersionedByteArray()
-
applyCorrection
public static double applyCorrection(double e, int zeroCount)
-
estimateByteBuffer
public static double estimateByteBuffer(ByteBuffer buf)
-
getVersion
public abstract byte getVersion()
-
setVersion
public abstract void setVersion(ByteBuffer buffer)
-
getRegisterOffset
public abstract byte getRegisterOffset()
-
setRegisterOffset
public abstract void setRegisterOffset(byte registerOffset)
-
setRegisterOffset
public abstract void setRegisterOffset(ByteBuffer buffer, byte registerOffset)
-
getNumNonZeroRegisters
public abstract short getNumNonZeroRegisters()
-
setNumNonZeroRegisters
public abstract void setNumNonZeroRegisters(short numNonZeroRegisters)
-
setNumNonZeroRegisters
public abstract void setNumNonZeroRegisters(ByteBuffer buffer, short numNonZeroRegisters)
-
getMaxOverflowValue
public abstract byte getMaxOverflowValue()
-
setMaxOverflowValue
public abstract void setMaxOverflowValue(byte value)
-
setMaxOverflowValue
public abstract void setMaxOverflowValue(ByteBuffer buffer, byte value)
-
getMaxOverflowRegister
public abstract short getMaxOverflowRegister()
-
setMaxOverflowRegister
public abstract void setMaxOverflowRegister(short register)
-
setMaxOverflowRegister
public abstract void setMaxOverflowRegister(ByteBuffer buffer, short register)
-
getNumHeaderBytes
public abstract int getNumHeaderBytes()
-
getNumBytesForDenseStorage
public abstract int getNumBytesForDenseStorage()
-
getPayloadBytePosition
public abstract int getPayloadBytePosition()
-
getPayloadBytePosition
public abstract int getPayloadBytePosition(ByteBuffer buffer)
-
getInitPosition
protected int getInitPosition()
-
getStorageBuffer
protected ByteBuffer getStorageBuffer()
-
add
public void add(byte[] hashedValue)
-
add
public void add(short bucket, byte positionOf1)
-
fold
public HyperLogLogCollector fold(@Nullable HyperLogLogCollector other)
-
fold
public HyperLogLogCollector fold(ByteBuffer buffer)
-
toByteBuffer
public ByteBuffer toByteBuffer()
-
toByteArray
public byte[] toByteArray()
-
estimateCardinalityRound
public long estimateCardinalityRound()
-
estimateCardinality
public double estimateCardinality()
-
compareTo
public int compareTo(HyperLogLogCollector other)
- Specified by:
compareTo
in interfaceComparable<HyperLogLogCollector>
-
-