|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectit.unimi.dsi.util.IntHyperLogLogCounterArray
public class IntHyperLogLogCounterArray
An array of approximate sets each represented using a HyperLogLog counter.
HyperLogLog counters represent the number of elements of a set in an approximate way. They have been introduced by Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Freédeéric Meunier in “HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm”, Proceedings of the 13th conference on analysis of algorithm (AofA 07), pages 127−146, 2007. They are an improvement over the basic idea of loglog counting, introduced by Marianne Durand and Philippe Flajolet in “Loglog counting of large cardinalities”, ESA 2003, 11th Annual European Symposium, volume 2832 of Lecture Notes in Computer Science, pages 605−617, Springer, 2003.
Each counter is composed by m
registers, and each register is made of registerSize
bits.
The first number depends on the desired relative standard deviation, and its logarithm can be computed using log2NumberOfRegisters(double)
,
whereas the second number depends on an upper bound on the number of distinct elements to be counted, and it can be computed
using registerSize(long)
.
Actually, this class implements an array of counters. Each counter is completely independent, but they all use the same hash function.
The reason for this design is that in our intended applications hundred of millions of counters are common, and the JVM overhead to create such a number of objects
would be unbearable. This class allocates an array of LongArrayBitVector
s, each containing CHUNK_SIZE
registers,
and can thus handle billions of billions of registers efficiently (in turn, this means being able to
handle an array of millions of billions of high-precision counters).
When creating an instance, you can choose the size of the array (i.e., the number of counters) and the desired relative standard deviation (either explicitly or choosing the number of registers per counter). Then, you can add an element to a counter. At any time, you can count count (approximately) the number of distinct elements that have been added to a counter.
If you need to reuse this class multiple times, you can clear all registers, possibly setting a new seed. The seed is used to compute the hash function used by the HyperLogLog counters.
Field Summary | |
---|---|
protected LongArrayBitVector[] |
bitVector
An array of bit vectors containing all registers. |
static long |
CHUNK_MASK
The mask used to obtain an register offset in a chunk. |
static int |
CHUNK_SHIFT
The logarithm of the maximum size in registers of a bit vector. |
static long |
CHUNK_SIZE
The maximum size in registers of a bit vector. |
protected int |
counterShift
The shift that selects the chunk corresponding to a counter. |
protected int |
counterSize
The size in bits of each counter ( registerSize * m ). |
protected int |
log2m
The logarithm of the number of registers per counter. |
protected int |
m
The number of registers per counter. |
protected int |
mMinus1
The number of registers minus one. |
protected LongBigList[] |
registers
registerSize -bit views of bitVector . |
protected int |
registerSize
The size in bits of each register. |
protected long |
seed
A seed for hashing. |
Constructor Summary | |
---|---|
IntHyperLogLogCounterArray(int arraySize,
long n,
double rsd)
Creates a new array of counters. |
|
IntHyperLogLogCounterArray(int arraySize,
long n,
int log2m)
Creates a new array of counters. |
|
IntHyperLogLogCounterArray(int arraySize,
long n,
int log2m,
long seed)
Creates a new array of counters. |
Method Summary | |
---|---|
void |
add(int k,
int v)
Adds an element to a counter. |
protected int |
chunk(int counter)
Returns the chunk of a given counter. |
void |
clear()
Clears all registers. |
void |
clear(long seed)
Clears all registers and sets a new seed (e.g., using Util.randomSeed() ). |
double |
count(int k)
Estimates the number of distinct elements that have been added to a given counter so far. |
protected double |
count(long[] bits,
long offset)
Estimates the number of distinct elements that have been added to a given counter so far. |
static int |
log2NumberOfRegisters(double rsd)
Returns the logarithm of the number of registers per counter that are necessary to attain a given relative standard deviation. |
protected long |
offset(int counter)
Returns the bit offset of a given counter in its chunk. |
LongBigList[] |
registers()
Returns the array of big lists of registers underlying this array of counters. |
static int |
registerSize(long n)
Returns the register size in bits, given an upper bound on the number of distinct elements. |
static double |
relativeStandardDeviation(int log2m)
Returns the relative standard deviation corresponding to a given logarithm of the number of registers per counter. |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final int CHUNK_SHIFT
public static final long CHUNK_SIZE
public static final long CHUNK_MASK
protected final LongArrayBitVector[] bitVector
protected final LongBigList[] registers
registerSize
-bit views of bitVector
.
protected final int log2m
protected final int m
protected final int mMinus1
protected final int registerSize
protected final int counterSize
registerSize
*
m
).
protected final int counterShift
protected long seed
Constructor Detail |
---|
public IntHyperLogLogCounterArray(int arraySize, long n, double rsd)
arraySize
- the number of counters.n
- the expected number of elements.rsd
- the relative standard deviation.public IntHyperLogLogCounterArray(int arraySize, long n, int log2m)
arraySize
- the number of counters.n
- the expected number of elements.log2m
- the logarithm of the number of registers per counter.public IntHyperLogLogCounterArray(int arraySize, long n, int log2m, long seed)
arraySize
- the number of counters.n
- the expected number of elements.log2m
- the logarithm of the number of registers per counter.seed
- the seed used to compute the hash function.Method Detail |
---|
public static int log2NumberOfRegisters(double rsd)
rsd
- the relative standard deviation to be attained.
rsd
.public static double relativeStandardDeviation(int log2m)
log2m
- the logarithm of the number of registers.
public static int registerSize(long n)
n
- an upper bound on the number of distinct elements.
protected int chunk(int counter)
counter
- a counter.
protected long offset(int counter)
counter
- a counter.
public void clear(long seed)
Util.randomSeed()
).
seed
- the new seed used to compute the hash functionpublic void clear()
public void add(int k, int v)
k
- the index of the counter.v
- the element to be added.public LongBigList[] registers()
The main purpose of this method is debugging, as it makes comparing the evolution of the state of two implementations easy.
protected double count(long[] bits, long offset)
bits
- the bit array containing the counter.offset
- the starting bit position of the counter in bits
.
public double count(int k)
k
- the index of the counter.
k
so far.
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |