sealed abstract class TopCMS[K] extends Serializable with CMSCounting[K, TopCMS] with CMSHeavyHitters[K]
A Count-Min sketch data structure that allows for (a) counting and frequency estimation of elements in a data stream and (b) tracking the heavy hitters among these elements.
The logic of how heavy hitters are computed is pluggable, see HeavyHittersLogic.
Tip: If you do not need to track heavy hitters, take a look at CMS, which is more efficient in this case.
Usage
This example demonstrates how to count Long
elements with TopCMS, i.e. K=Long
.
Note that the actual counting is always performed with a Long
, regardless of your choice of K
. That is,
the counting table behind the scenes is backed by Long
values (at least in the current implementation), and thus
the returned frequency estimates are always instances of Approximate[Long]
.
- K
The type used to identify the elements to be counted.
// Creates a monoid for a CMS that can count `Long` elements. val topPctCMSMonoid: TopPctCMSMonoid[Long] = { val eps = 0.001 val delta = 1E-10 val seed = 1 val heavyHittersPct = 0.1 TopPctCMS.monoid[Long](eps, delta, seed, heavyHittersPct) } // Creates a TopCMS instance that has counted the element `1L`. val topCMS: TopCMS[Long] = topPctCMSMonoid.create(1L) // Estimates the frequency of `1L` val estimate: Approximate[Long] = topCMS.frequency(1L) // What are the heavy hitters so far? val heavyHitters: Set[Long] = topCMS.heavyHitters
- Alphabetic
- By Inheritance
- TopCMS
- CMSHeavyHitters
- CMSCounting
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Abstract Value Members
- abstract def +(item: K, count: Long): TopCMS[K]
Counts the item
count
times and returns the result as a new sketch.Counts the item
count
times and returns the result as a new sketch.- Definition Classes
- CMSCounting
- abstract def ++(other: TopCMS[K]): TopCMS[K]
Returns a new sketch that is the combination of this sketch and the other sketch.
Returns a new sketch that is the combination of this sketch and the other sketch.
- Definition Classes
- CMSCounting
- abstract def heavyHitters: Set[K]
Returns the set of heavy hitters.
Returns the set of heavy hitters.
- Definition Classes
- CMSHeavyHitters
Concrete Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- def +(item: K): TopCMS[K]
Counts the item and returns the result as a new sketch.
Counts the item and returns the result as a new sketch.
- Definition Classes
- CMSCounting
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- val cms: CMS[K]
- val delta: Double
Returns the bound on the probability that a query estimate does NOT lie within some small interval (an interval that depends on
eps
) around the truth.Returns the bound on the probability that a query estimate does NOT lie within some small interval (an interval that depends on
eps
) around the truth.- Definition Classes
- TopCMS → CMSCounting
- def depth: Int
Number of hash functions (also: number of rows in the counting table).
Number of hash functions (also: number of rows in the counting table). This number is derived from
delta
.- Definition Classes
- CMSCounting
- val eps: Double
Returns the one-sided error bound on the error of each point query, i.e.
Returns the one-sided error bound on the error of each point query, i.e. frequency estimate.
- Definition Classes
- TopCMS → CMSCounting
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def f1: Long
The first frequency moment is the total number of elements in the stream.
The first frequency moment is the total number of elements in the stream.
- Definition Classes
- CMSCounting
- def f2: Approximate[Long]
The second frequency moment is
\sum a_i^2
, where
a_iis the count of the i-th element.
The second frequency moment is
\sum a_i^2
, where
a_iis the count of the i-th element.
- Definition Classes
- TopCMS → CMSCounting
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def frequency(item: K): Approximate[Long]
Returns an estimate of the total number of times this item has been seen in the stream so far.
Returns an estimate of the total number of times this item has been seen in the stream so far. This estimate is an upper bound.
It is always true that
estimatedFrequency >= trueFrequency
. With probabilityp >= 1 - delta
, it also holds thatestimatedFrequency <= trueFrequency + eps * totalCount
.- Definition Classes
- TopCMS → CMSCounting
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- def heavyHittersLogic: HeavyHittersLogic[K]
The pluggable logic with which heavy hitters are being tracked.
The pluggable logic with which heavy hitters are being tracked.
- Definition Classes
- TopCMS → CMSHeavyHitters
- def innerProduct(other: TopCMS[K]): Approximate[Long]
Returns an estimate of the inner product against another data stream.
Returns an estimate of the inner product against another data stream.
In other words, let a_i denote the number of times element i has been seen in the data stream summarized by this CMS, and let b_i denote the same for the other CMS. Then this returns an estimate of
<a, b> = \sum a_i b_i
.Note: This can also be viewed as the join size between two relations.
It is always true that actualInnerProduct <= estimatedInnerProduct. With probability
p >= 1 - delta
, it also holds thatestimatedInnerProduct <= actualInnerProduct + eps * thisTotalCount * otherTotalCount
.- Definition Classes
- TopCMS → CMSCounting
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def maxExactCount: Int
Number of exact counts a sparse CMS wants to keep.
Number of exact counts a sparse CMS wants to keep. This number is derived from
maxExactCountOpt
.- Definition Classes
- CMSCounting
- val maxExactCountOpt: Option[Int]
An Option parameter about how many exact counts a sparse CMS wants to keep
An Option parameter about how many exact counts a sparse CMS wants to keep
- Definition Classes
- TopCMS → CMSCounting
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- AnyRef → Any
- val totalCount: Long
Total number of elements counted (i.e.
Total number of elements counted (i.e. seen in the data stream) so far.
- Definition Classes
- TopCMS → CMSCounting
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- def width: Int
Number of counters per hash function (also: number of columns in the counting table).
Number of counters per hash function (also: number of columns in the counting table). This number is derived from
eps
.- Definition Classes
- CMSCounting