Create a new HeavyHitters instance.
Create a new HeavyHitters instance.
number of heavy hitters to keep track of
one-sided error bound on the error of each point query, i.e. frequency estimate
a bound on the probability that a query estimate does not lie within some small
interval (an interval that depends on eps
) around the truth
a seed to initialize the random number generator used to create the pairwise independent hash functions
Create a new HeavyHitters from a settings object
Create a new HeavyHitters from a settings object
Settings object
Transform a collection of categorical features to 2 columns, one for rank and one for count. Only the top heavyHittersCount items are tracked, with 1.0 being the most frequent rank, 2.0 the second most, etc. All other items are transformed to [0.0, 0.0].
Ranks and frequencies are estimated with Algebird's SketchMap data structure. With probability at least
1 - delta
, this estimate is withineps * N
of the true frequency (i.e.,true frequency <= estimate <= true frequency + eps * N
), where N is the total size of the input collection.Missing values are transformed to [0.0, 0.0].