it.unimi.dsi.util.IntParallelCounterArray

All Implemented Interfaces:: Serializable

public class IntParallelCounterArray
extends Object
implements Serializable

An array of approximate sets each represented using a Parallel counter.

Parallel counters represent the number of elements of a set in an approximate way. They have been introduced by Philippe Flajolet, Éric Fusy, Olivier Gandouet, and Freédeéric Meunier in “Parallel: the analysis of a near-optimal cardinality estimation algorithm”, Proceedings of the 13th conference on analysis of algorithm (AofA 07), pages 127−146, 2007. They are an improvement over the basic idea of loglog counting, introduced by Marianne Durand and Philippe Flajolet in “Loglog counting of large cardinalities”, ESA 2003, 11th Annual European Symposium, volume 2832 of Lecture Notes in Computer Science, pages 605−617, Springer, 2003.

Each counter is composed by m registers, and each register is made of registerSize bits. The first number depends on the desired relative standard deviation, and its logarithm can be computed using log2NumberOfRegisters(double), whereas the second number depends on an upper bound on the number of distinct elements to be counted, and it can be computed using registerSize(long).

Actually, this class implements an array of counters. Each counter is completely independent, but they all use the same hash function. The reason for this design is that in our intended applications hundred of millions of counters are common, and the JVM overhead to create such a number of objects would be unbearable. This class allocates an array of LongArrayBitVectors, each containing CHUNK_SIZE registers, and can thus handle billions of billions of registers efficiently (in turn, this means being able to handle an array of millions of billions of high-precision counters).

When creating an instance, you can choose the size of the array (i.e., the number of counters) and the desired relative standard deviation (either explicitly or choosing the number of registers per counter). Then, you can add an element to a counter. At any time, you can count count (approximately) the number of distinct elements that have been added to a counter.

Author:: Paolo Boldi, Sebastiano Vigna
See Also:: Serialized Form

Field Summary

Fields
Modifier and Type	Field	Description
`protected double`	`base`
`protected LongArrayBitVector[]`	`bitVector`	A an array of bit vectors containing all registers.
`static long`	`CHUNK_MASK`
`static int`	`CHUNK_SHIFT`	The logarithm of the maximum size in registers of a bit vector.
`static long`	`CHUNK_SIZE`	The maximum size in registers of a bit vector.
`protected int`	`log2m`	The number of registers.
`protected int`	`m`	The number of registers.
`static int`	`MAX_EXPONENT`
`protected int`	`mMinus1`	The number of registers minus one.
`protected int`	`nodeShift`	The shift that selects the chunk corresponding to a node.
`protected int`	`registerMask`	The mask corresponding to a register.
`protected LongBigList[]`	`registers`	`registerSize`-bit views of `bitVector`.
`protected int`	`registerSize`	The size in bits of each register.

Constructor Summary

Constructors
Constructor	Description
`IntParallelCounterArray(int arraySize, long n, double rsd, double floatingPointPrecision)`	Creates a new array of counters.
`IntParallelCounterArray(int arraySize, long n, int log2m, double floatingPointPrecision)`	Creates a new array of counters.
`IntParallelCounterArray(int arraySize, long n, int log2m, double floatingPointPrecision, long seed)`	Creates a new array of counters.

Method Summary

Modifier and Type	Method	Description
`void`	`add(int k, int v)`	Adds an element to a counter.
`double`	`count(int k)`	Estimates the number of distinct elements that have been added to a given counter so far.
`static int`	`log2NumberOfRegisters(double rsd)`	Returns the logarithm of the number of registers per counter that are necessary to attain a given relative standard deviation.
`void`	`printMins()`
`LongBigList[]`	`registers()`	Returns the array of big lists of registers underlying this array of counters.
`static int`	`registerSize(long n)`	Returns the register size in bits, given an upper bound on the number of distinct elements.
`static double`	`relativeStandardDeviation(int log2m)`	Returns the relative standard deviation corresponding to a given logarithm of the number of registers per counter.

Methods inherited from class java.lang.Object

clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- MAX_EXPONENT
  
  public static final int MAX_EXPONENT
  
  See Also:
  
  Constant Field Values
- CHUNK_SHIFT
  
  public static final int CHUNK_SHIFT
  
  The logarithm of the maximum size in registers of a bit vector.
  
  See Also:
  
  Constant Field Values
- CHUNK_SIZE
  
  public static final long CHUNK_SIZE
  
  The maximum size in registers of a bit vector.
  
  See Also:
  
  Constant Field Values
- CHUNK_MASK
  
  public static final long CHUNK_MASK
  
  See Also:
  
  Constant Field Values
- bitVector
  
  protected final LongArrayBitVector[] bitVector
  
  A an array of bit vectors containing all registers.
- registers
  
  protected final LongBigList[] registers
  
  registerSize-bit views of bitVector.
- m
  
  protected final int m
  
  The number of registers.
- log2m
  
  protected final int log2m
  
  The number of registers.
- mMinus1
  
  protected final int mMinus1
  
  The number of registers minus one.
- registerSize
  
  protected final int registerSize
  
  The size in bits of each register.
- registerMask
  
  protected final int registerMask
  
  The mask corresponding to a register.
- nodeShift
  
  protected final int nodeShift
  
  The shift that selects the chunk corresponding to a node.
- base
  
  protected double base
Constructor Details
- IntParallelCounterArray
  
  public IntParallelCounterArray(int arraySize, long n, double rsd, double floatingPointPrecision)
  
  Creates a new array of counters.
  
  Parameters:
  
  arraySize - the number of counters.
  
  n - the expected number of elements.
  
  rsd - the relative standard deviation.
  
  floatingPointPrecision - the precision used for floating-point computations.
- IntParallelCounterArray
  
  public IntParallelCounterArray(int arraySize, long n, int log2m, double floatingPointPrecision)
  
  Creates a new array of counters.
  
  Parameters:
  
  arraySize - the number of counters.
  
  n - the expected number of elements.
  
  log2m - the logarithm of the number of registers per counter.
  
  floatingPointPrecision - the precision used for floating-point computations.
- IntParallelCounterArray
  
  public IntParallelCounterArray(int arraySize, long n, int log2m, double floatingPointPrecision, long seed)
  
  Creates a new array of counters.
  
  Parameters:
  
  arraySize - the number of counters.
  
  n - the expected number of elements.
  
  log2m - the logarithm of the number of registers per counter.
  
  floatingPointPrecision - the precision used for floating-point computations.
  
  seed - the seed used to compute the hash function
Method Details
- log2NumberOfRegisters
  
  public static int log2NumberOfRegisters(double rsd)
  
  Returns the logarithm of the number of registers per counter that are necessary to attain a given relative standard deviation.
  
  Parameters:
  
  rsd - the relative standard deviation to be attained.
  
  Returns:
  
  the logarithm of the number of registers that are necessary to attain relative standard deviation rsd.
- relativeStandardDeviation
  
  public static double relativeStandardDeviation(int log2m)
  
  Returns the relative standard deviation corresponding to a given logarithm of the number of registers per counter.
  
  Parameters:
  
  log2m - the logarithm of the number of registers.
  
  Returns:
  
  the resulting relative standard deviation.
- registerSize
  
  public static int registerSize(long n)
  
  Returns the register size in bits, given an upper bound on the number of distinct elements.
  
  Parameters:
  
  n - an upper bound on the number of distinct elements.
  
  Returns:
  
  the register size in bits.
- add
  
  public void add(int k, int v)
  
  Adds an element to a counter.
  
  Parameters:
  
  k - the index of the counter.
  
  v - the element to be added.
- printMins
  
  public void printMins()
- registers
  
  public LongBigList[] registers()
  
  Returns the array of big lists of registers underlying this array of counters.
  The main purpose of this method is debugging, as it makes comparing the evolution of the state of two implementations easy.
  
  Returns:
  
  the array of big lists of registers underlying this array of counters.
- count
  
  public double count(int k)
  
  Estimates the number of distinct elements that have been added to a given counter so far.
  
  Parameters:
  
  k - the index of the counter.
  
  Returns:
  
  an approximation of the number of distinct elements that have been added to counter k so far.

Class IntParallelCounterArray

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

Constructor Details

Method Details