final class DecayingCMS[K] extends Serializable
DecayingCMS is a module to build count-min sketch instances whose counts decay exponentially.
Similar to a Map[K, com.twitter.algebird.DecayedValue], each key is associated with a single count value that decays over time. Unlike a map, the decyaing CMS is an approximate count -- in exchange for the possibility of over-counting, we can bound its size in memory.
The intended use case is for metrics or machine learning where exact values aren't needed.
You can expect the keys with the biggest values to be fairly accurate but the very small values (rare keys or very old keys) to be lost in the noise. For both metrics and ML this should be fine: you can't learn too much from very rare values.
We recommend depth of at least 5, and width of at least 100, but you should do some experiments to determine the smallest parameters that will work for your use case.
- Self Type
- DecayingCMS[K]
- Alphabetic
- By Inheritance
- DecayingCMS
- Serializable
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Instance Constructors
Type Members
- final class CMS extends Serializable
The idealized formula for the updating current value for a key (y0 -> y1) is given as:
The idealized formula for the updating current value for a key (y0 -> y1) is given as:
delta = (t1 - t0) / halflife y1 = y0 * 2^(-delta) + n
However, we want to avoid having to rescale every single cell every time we update; i.e. a cell with a zero value should continue to have a zero value when n=0.
Therefore, we introduce a change of variable to cell values (z) along with a scale factor (scale), and the following formula:
(1) zN = yN * scaleN
Our constraint is expressed as:
(2) If n=0, z1 = z0
In that case:
(3) If n=0, (y1 * scale1) = (y0 * scale0) (4) Substituting for y1, (y0 * 2(-delta) + 0) * scale1 = y0 * scale0 (5) 2(-delta) * scale1 = scale0 (6) scale1 = scale0 * 2^(delta)
Also, to express z1 in terms of z0, we say:
(7) z1 = y1 * scale1 (8) z1 = (y0 * 2(-delta) + n) * scale1 (9) z1 = ((z0 / scale0) * 2(-delta) + n) * scale1 (10) z1 / scale1 = (z0 / (scale1 * 2(-delta))) * 2(-delta) + n (11) z1 / scale1 = z0 / scale1 + n (12) z1 = z0 + n * scale1
So, for cells where n=0, we just update scale0 to scale1, and for cells where n is non-zero, we update z1 in terms of z0 and scale1.
If we convert scale to logscale, we have:
(13) logscale1 = logscale0 + delta * log(2) (14) z1 = z0 + n * exp(logscale1)
When logscale1 gets big, we start to distort z1. For example, exp(36) is close to 2^53. We can measure when n * exp(logscale1) gets big, and in those cases we can rescale all our cells (set each z to its corresponding y) and set the logscale to 0.
(15) y1 = z1 / scale1 (16) y1 = z1 / exp(logscale1) (17) y1 = z1 * exp(-logscale1)
- class DoubleAt extends Serializable
Represents a decaying scalar value at a particular point in time.
Represents a decaying scalar value at a particular point in time.
The value decays according to halfLife. Another way to think about DoubleAt is that it represents a particular decay curve (and in particular, a point along that curve). Two DoubleAt values may be equivalent if they are two points on the same curve.
The
timeToZero
andtimeToUnit
methods can be used to "normalize" DoubleAt values. If two DoubleAt values do not produce the same (approximate) Double values from these methods, they represent different curves.
Value Members
- final def !=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def ##: Int
- Definition Classes
- AnyRef → Any
- final def ==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
- final def asInstanceOf[T0]: T0
- Definition Classes
- Any
- def clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.CloneNotSupportedException]) @native()
- val depth: Int
- val empty: CMS
- final def eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- def equals(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef → Any
- def finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.Throwable])
- def fromTimestamp(t: Long): Double
- final def getClass(): Class[_ <: AnyRef]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- val halfLife: Duration
- val halfLifeSecs: Double
- def hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- val hashFns: Array[(K) => Int]
- final def isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- val monoid: Monoid[CMS]
- final def ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
- final def notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- final def synchronized[T0](arg0: => T0): T0
- Definition Classes
- AnyRef
- def toString(): String
- Definition Classes
- DecayingCMS → AnyRef → Any
- def toTimestamp(t: Double): Long
- val totalCells: Int
- final def wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException])
- final def wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws(classOf[java.lang.InterruptedException]) @native()
- val width: Int
- object CMS extends Serializable
- object DoubleAt extends Serializable