Class

org.platanios.tensorflow.api.utilities

Reservoir

Related Doc: package utilities

Permalink

case class Reservoir[K, V](maxSize: Int, seed: Long = 0L, alwaysKeepLast: Boolean = true) extends Product with Serializable

A key-value store using deterministic reservoir sampling.

Items are added with an associated key. Items may be retrieved by the corresponding key, and a list of keys can also be retrieved. If maxSize is not zero, then it dictates the maximum number of items that will be stored for each key. Once there are more items for a given key, they are replaced via reservoir sampling, such that each item has an equal probability of being included in the sample.

Deterministic means that for any given seed and bucket size, the sequence of values that are kept for any given key will always be the same, and that this is independent of any insertions for other keys. That is:

val reservoirA = ReservoirKVStore(10)
val reservoirB = ReservoirKVStore(10)
(0 until 100).foreach(i => reservoirA.add("key1", i))
(0 until 100).foreach(i => reservoirA.add("key2", i))
(0 until 100).foreach(i => {
  reservoirB.add("key1", i)
  reservoirB.add("key2", i)
})

After executing this code, reservoirA and reservoirB will be in identical states.

For more information on reservoir sampling, refer to [this page](https://en.wikipedia.org/wiki/Reservoir_sampling).

Note that, adding items has amortized O(1) runtime cost.

maxSize

Maximum size of each bucket in this reservoir key-value store.

seed

Seed to use for the random number generator used while sampling.

alwaysKeepLast

Boolean flag indicating whether to always store the last seen item. If set to true and the last seen item was not sampled to be stored, then it replaces the last item in the corresponding bucket.

Linear Supertypes
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. Reservoir
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new Reservoir(maxSize: Int, seed: Long = 0L, alwaysKeepLast: Boolean = true)

    Permalink

    maxSize

    Maximum size of each bucket in this reservoir key-value store.

    seed

    Seed to use for the random number generator used while sampling.

    alwaysKeepLast

    Boolean flag indicating whether to always store the last seen item. If set to true and the last seen item was not sampled to be stored, then it replaces the last item in the corresponding bucket.

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def add(key: K, item: V, transformFn: (V) ⇒ V = identity[V]): Unit

    Permalink

    Adds a new item to the reservoir with the provided key.

    Adds a new item to the reservoir with the provided key.

    If the corresponding reservoir has not yet reached full size, then the new item is guaranteed to be added. If the reservoir is full, then the behavior of this method depends on the value of alwaysKeepLast.

    If alwaysKeepLast is set to true, then the new item is guaranteed to be added to the reservoir, and either the previous last item will be replaced, or (with low probability) an older item will be replaced.

    If alwaysKeepLast is set to false, then the new item may replace an old item with low probability.

    If transformFn is provided, then it will be applied to transform the provided item (lazily, if and only if the item is going to be included in the reservoir).

    key

    Key for the item to add.

    item

    Item to add.

    transformFn

    Transform function for the item to add.

  5. val alwaysKeepLast: Boolean

    Permalink

    Boolean flag indicating whether to always store the last seen item.

    Boolean flag indicating whether to always store the last seen item. If set to true and the last seen item was not sampled to be stored, then it replaces the last item in the corresponding bucket.

  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. def filter(filterFn: (V) ⇒ Boolean, key: Option[K] = None): Int

    Permalink

    Filters the items in this reservoir using the provided filtering function.

    Filters the items in this reservoir using the provided filtering function.

    When filtering items from each reservoir bucket, we must update the internal state variable numItemsSeen, which is used for determining the rate of replacement in reservoir sampling. Ideally, numItemsSeen would contain the exact number of items that have ever been seen by the add function of this reservoir, and that satisfy the provided filtering function. However, the reservoir bucket does not have access to all of the items it has seen -- it only has access to the subset of items that have survived sampling (i.e., _items). Therefore, we estimate numItemsSeen by scaling its original value by the same ratio as the ratio of items that were not filtered out and that are currently stored in this reservoir bucket.

    filterFn

    Filtering function that returns true for the items to be kept in the reservoir.

    key

    Optional key for which to filter the values. If None (the default), then the values for all keys in the reservoir are filtered.

    returns

    Number of items filtered from this reservoir.

  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  12. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  13. def items(key: K): List[V]

    Permalink

    Returns all the items stored for the provided key and throws an exception if the key does not exist.

  14. def keys: Iterable[K]

    Permalink

    Returns all the keys in the reservoir.

  15. val maxSize: Int

    Permalink

    Maximum size of each bucket in this reservoir key-value store.

  16. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  17. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  18. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  19. val seed: Long

    Permalink

    Seed to use for the random number generator used while sampling.

  20. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  21. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  23. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped