object HyperLogLog
Implementation of the HyperLogLog approximate counting as a Monoid
- See also
http://algo.inria.fr/flajolet/Publications/FlFuGaMe07.pdf HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm Philippe Flajolet and Éric Fusy and Olivier Gandouet and Frédéric Meunier
- Alphabetic
- By Inheritance
- HyperLogLog
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- def alpha(bits: Int): Double
- def approximateSize(bits: Int, size: Int, zeroCnt: Int, z: Double): Approximate[Long]
- def asApprox(bits: Int, v: Double): Approximate[Long]
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def bitsForError(err: Double): Int
This gives you a number of bits to use to have a given standard error
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def error(bits: Int): Double
The true error is distributed like a Gaussian with this standard deviation.
The true error is distributed like a Gaussian with this standard deviation. let m = 2^bits. The size of the HLL is m bytes.
bits | size | error 9 512 0.0460 10 1024 0.0325 11 2048 0.0230 12 4096 0.0163 13 8192 0.0115 14 16384 0.0081 15 32768 0.0057 16 65536 0.0041 17 131072 0.0029 18 262144 0.0020 19 524288 0.0014 20 1048576 0.0010
Keep in mind, to store N distinct longs, you only need 8N bytes. See SetSizeAggregator for an approach that uses an exact set when the cardinality is small, and switches to HLL after we have enough items. Ideally, you would keep an exact set until it is smaller to store the HLL (but actually since we use sparse vectors to store the HLL, a small HLL takes a lot less than the size above).
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def fromByteBuffer(bb: ByteBuffer): HLL
- def fromBytes(bytes: Array[Byte]): HLL
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hash(input: Array[Byte]): Array[Byte]
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- val hashSize: Int
- def initialEstimate(bits: Int, size: Int, zeroCnt: Int, z: Double): Double
- implicit def int2Bytes(i: Int): Array[Byte]
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def j(bytes: Array[Byte], bits: Int): Int
- def jRhoW(in: Array[Byte], bits: Int): (Int, Byte)
We are computing j and \rho(w) from the paper, sorry for the name, but it allows someone to compare to the paper extremely low probability rhow (position of the leftmost one bit) is > 127, so we use a Byte to store it Given a hash <w_0, w_1, w_2 ...
We are computing j and \rho(w) from the paper, sorry for the name, but it allows someone to compare to the paper extremely low probability rhow (position of the leftmost one bit) is > 127, so we use a Byte to store it Given a hash <w_0, w_1, w_2 ... w_n> the value 'j' is equal to <w_0, w_1 ... w_(bits-1)> and the value 'w' is equal to <w_bits ... w_n>. The function rho counts the number of leading zeroes in 'w'. We can calculate rho(w) at once with the method rhoW.
- implicit def long2Bytes(i: Long): Array[Byte]
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- def rhoW(bytes: Array[Byte], bits: Int): Byte
The value 'w' is represented as a bitset (encoding in
bytes
).The value 'w' is represented as a bitset (encoding in
bytes
). This function counts the number of leading zeros in 'w'.Each byte is treated as a set of bits (little-endian). That is, the one bit represents the first value, then the two bit, then four, and so on.
We treat the leading
bits
bits as if they were instead a single zero bit. - final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toBytes(h: HLL): Array[Byte]
- def toString(): String
- Definition Classes
- AnyRef → Any
- def twopow(i: Int): Double
- Annotations
- @inline()
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
Deprecated Value Members
- def j(bsl: BitSetLite, bits: Int): Int
- Annotations
- @deprecated
- Deprecated
(Since version 0.12.3) This is no longer used. Use j(Array[Byte], Int) instead.
- def rhoW(bsl: BitSetLite, bits: Int): Byte
- Annotations
- @deprecated
- Deprecated
(Since version 0.12.3) This is no longer used. Use rhoW(Array[Byte], Int) instead.