Packages

final class DecayingCMS[K] extends Serializable

DecayingCMS is a module to build count-min sketch instances whose counts decay exponentially.

Similar to a Map[K, com.twitter.algebird.DecayedValue], each key is associated with a single count value that decays over time. Unlike a map, the decyaing CMS is an approximate count -- in exchange for the possibility of over-counting, we can bound its size in memory.

The intended use case is for metrics or machine learning where exact values aren't needed.

You can expect the keys with the biggest values to be fairly accurate but the very small values (rare keys or very old keys) to be lost in the noise. For both metrics and ML this should be fine: you can't learn too much from very rare values.

We recommend depth of at least 5, and width of at least 100, but you should do some experiments to determine the smallest parameters that will work for your use case.

Self Type
DecayingCMS[K]
Linear Supertypes
Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DecayingCMS
  2. Serializable
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. Protected

Instance Constructors

  1. new DecayingCMS(seed: Long, halfLife: Duration, depth: Int, width: Int, hasher: CMSHasher[K])

Type Members

  1. final class CMS extends Serializable

    The idealized formula for the updating current value for a key (y0 -> y1) is given as:

    The idealized formula for the updating current value for a key (y0 -> y1) is given as:

    delta = (t1 - t0) / halflife y1 = y0 * 2^(-delta) + n

    However, we want to avoid having to rescale every single cell every time we update; i.e. a cell with a zero value should continue to have a zero value when n=0.

    Therefore, we introduce a change of variable to cell values (z) along with a scale factor (scale), and the following formula:

    (1) zN = yN * scaleN

    Our constraint is expressed as:

    (2) If n=0, z1 = z0

    In that case:

    (3) If n=0, (y1 * scale1) = (y0 * scale0) (4) Substituting for y1, (y0 * 2(-delta) + 0) * scale1 = y0 * scale0 (5) 2(-delta) * scale1 = scale0 (6) scale1 = scale0 * 2^(delta)

    Also, to express z1 in terms of z0, we say:

    (7) z1 = y1 * scale1 (8) z1 = (y0 * 2(-delta) + n) * scale1 (9) z1 = ((z0 / scale0) * 2(-delta) + n) * scale1 (10) z1 / scale1 = (z0 / (scale1 * 2(-delta))) * 2(-delta) + n (11) z1 / scale1 = z0 / scale1 + n (12) z1 = z0 + n * scale1

    So, for cells where n=0, we just update scale0 to scale1, and for cells where n is non-zero, we update z1 in terms of z0 and scale1.

    If we convert scale to logscale, we have:

    (13) logscale1 = logscale0 + delta * log(2) (14) z1 = z0 + n * exp(logscale1)

    When logscale1 gets big, we start to distort z1. For example, exp(36) is close to 2^53. We can measure when n * exp(logscale1) gets big, and in those cases we can rescale all our cells (set each z to its corresponding y) and set the logscale to 0.

    (15) y1 = z1 / scale1 (16) y1 = z1 / exp(logscale1) (17) y1 = z1 * exp(-logscale1)

  2. class DoubleAt extends Serializable

    Represents a decaying scalar value at a particular point in time.

    Represents a decaying scalar value at a particular point in time.

    The value decays according to halfLife. Another way to think about DoubleAt is that it represents a particular decay curve (and in particular, a point along that curve). Two DoubleAt values may be equivalent if they are two points on the same curve.

    The timeToZero and timeToUnit methods can be used to "normalize" DoubleAt values. If two DoubleAt values do not produce the same (approximate) Double values from these methods, they represent different curves.

Value Members

  1. final def !=(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  2. final def ##: Int
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0
    Definition Classes
    Any
  5. def clone(): AnyRef
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.CloneNotSupportedException]) @native()
  6. val depth: Int
  7. val empty: CMS
  8. final def eq(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  9. def equals(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit
    Attributes
    protected[lang]
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.Throwable])
  11. def fromTimestamp(t: Long): Double
  12. final def getClass(): Class[_ <: AnyRef]
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  13. val halfLife: Duration
  14. val halfLifeSecs: Double
  15. def hashCode(): Int
    Definition Classes
    AnyRef → Any
    Annotations
    @native()
  16. val hashFns: Array[(K) => Int]
  17. final def isInstanceOf[T0]: Boolean
    Definition Classes
    Any
  18. val monoid: Monoid[CMS]
  19. final def ne(arg0: AnyRef): Boolean
    Definition Classes
    AnyRef
  20. final def notify(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  21. final def notifyAll(): Unit
    Definition Classes
    AnyRef
    Annotations
    @native()
  22. final def synchronized[T0](arg0: => T0): T0
    Definition Classes
    AnyRef
  23. def toString(): String
    Definition Classes
    DecayingCMS → AnyRef → Any
  24. def toTimestamp(t: Double): Long
  25. val totalCells: Int
  26. final def wait(): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  27. final def wait(arg0: Long, arg1: Int): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException])
  28. final def wait(arg0: Long): Unit
    Definition Classes
    AnyRef
    Annotations
    @throws(classOf[java.lang.InterruptedException]) @native()
  29. val width: Int
  30. object CMS extends Serializable
  31. object DoubleAt extends Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped