Packages

  • package root
    Definition Classes
    root
  • package com
    Definition Classes
    root
  • package twitter
    Definition Classes
    com
  • package algebird
    Definition Classes
    twitter
  • final class DecayingCMS[K] extends Serializable

    DecayingCMS is a module to build count-min sketch instances whose counts decay exponentially.

    DecayingCMS is a module to build count-min sketch instances whose counts decay exponentially.

    Similar to a Map[K, com.twitter.algebird.DecayedValue], each key is associated with a single count value that decays over time. Unlike a map, the decyaing CMS is an approximate count -- in exchange for the possibility of over-counting, we can bound its size in memory.

    The intended use case is for metrics or machine learning where exact values aren't needed.

    You can expect the keys with the biggest values to be fairly accurate but the very small values (rare keys or very old keys) to be lost in the noise. For both metrics and ML this should be fine: you can't learn too much from very rare values.

    We recommend depth of at least 5, and width of at least 100, but you should do some experiments to determine the smallest parameters that will work for your use case.

    Definition Classes
    algebird
  • CMS
  • DoubleAt

final class CMS extends Serializable

The idealized formula for the updating current value for a key (y0 -> y1) is given as:

delta = (t1 - t0) / halflife y1 = y0 * 2^(-delta) + n

However, we want to avoid having to rescale every single cell every time we update; i.e. a cell with a zero value should continue to have a zero value when n=0.

Therefore, we introduce a change of variable to cell values (z) along with a scale factor (scale), and the following formula:

(1) zN = yN * scaleN

Our constraint is expressed as:

(2) If n=0, z1 = z0

In that case:

(3) If n=0, (y1 * scale1) = (y0 * scale0) (4) Substituting for y1, (y0 * 2(-delta) + 0) * scale1 = y0 * scale0 (5) 2(-delta) * scale1 = scale0 (6) scale1 = scale0 * 2^(delta)

Also, to express z1 in terms of z0, we say:

(7) z1 = y1 * scale1 (8) z1 = (y0 * 2(-delta) + n) * scale1 (9) z1 = ((z0 / scale0) * 2(-delta) + n) * scale1 (10) z1 / scale1 = (z0 / (scale1 * 2(-delta))) * 2(-delta) + n (11) z1 / scale1 = z0 / scale1 + n (12) z1 = z0 + n * scale1

So, for cells where n=0, we just update scale0 to scale1, and for cells where n is non-zero, we update z1 in terms of z0 and scale1.

If we convert scale to logscale, we have:

(13) logscale1 = logscale0 + delta * log(2) (14) z1 = z0 + n * exp(logscale1)

When logscale1 gets big, we start to distort z1. For example, exp(36) is close to 2^53. We can measure when n * exp(logscale1) gets big, and in those cases we can rescale all our cells (set each z to its corresponding y) and set the logscale to 0.

(15) y1 = z1 / scale1 (16) y1 = z1 / exp(logscale1) (17) y1 = z1 * exp(-logscale1)

Linear Supertypes
Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CMS
  2. Serializable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new CMS(cells: Array[Vector[Double]], logScale: Double, timeInHL: Double)

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. def +(other: CMS): CMS
  4. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  5. def add(t: Long, k: K, n: Double): CMS
  6. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  7. def bulkAdd(items: Iterable[(Long, K, Double)]): CMS
  8. val cells: Array[Vector[Double]]
  9. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  10. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  11. def equals(any: Any): Boolean
    Definition Classes
    CMS → AnyRef → Any
  12. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  13. def get(k: K): DoubleAt
  14. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  15. def getScale(t: Double): Double
  16. def hashCode(): Int
    Definition Classes
    CMS → AnyRef → Any
  17. def innerProductRoot(that: CMS): DoubleAt

    Returns the square-root of the inner product of two decaying CMSs.

    Returns the square-root of the inner product of two decaying CMSs.

    We want the result to decay at the same rate as the CMS for this method to be valid. Taking the square root ensures that this is true. Without it, we would violate the following equality (assuming we had at() on a CMS):

    x.innerProduct(y).at(t) = x.at(t).innerProduct(y.at(t))

    This is why we don't support innerProduct, only innerProductRoot.

  18. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  19. def l2Norm: DoubleAt
  20. def lastUpdateTime: Long
  21. val logScale: Double
  22. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  23. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  24. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  25. def range: (DoubleAt, DoubleAt)

    Provide lower and upper bounds on values returned for any possible key.

    Provide lower and upper bounds on values returned for any possible key.

    The first value is a lower bound: even keys that have never been counted will return this value or greater. This will be zero unless the CMS is saturated.

    The second value is an upper bound: the key with the largest cardinality will not be reported as being larger than this value (though it might be reported as being smaller).

    Together these values indicate how saturated and skewed the CMS might be.

  26. def scale(x: Double): CMS
  27. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  28. val timeInHL: Double
  29. def toString(): String
    Definition Classes
    CMS → AnyRef → Any
  30. def total: DoubleAt

    Get the total count of all items in the CMS.

    Get the total count of all items in the CMS.

    The total is the same as the l1Norm, since we don't allow negative values.

    Total is one of the few non-approximate statistics that DecayingCMS supports. We expect the total to be exact (except for floating-point error).

  31. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  32. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  33. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped