A key-value store using deterministic reservoir sampling.
A key-value store using deterministic reservoir sampling.
Items are added with an associated key. Items may be retrieved by the corresponding key, and a list of keys can also
be retrieved. If maxSize
is not zero, then it dictates the maximum number of items that will be stored for each
key. Once there are more items for a given key, they are replaced via reservoir sampling, such that each item has an
equal probability of being included in the sample.
Deterministic means that for any given seed and bucket size, the sequence of values that are kept for any given key will always be the same, and that this is independent of any insertions for other keys. That is:
val reservoirA = ReservoirKVStore(10) val reservoirB = ReservoirKVStore(10) (0 until 100).foreach(i => reservoirA.add("key1", i)) (0 until 100).foreach(i => reservoirA.add("key2", i)) (0 until 100).foreach(i => { reservoirB.add("key1", i) reservoirB.add("key2", i) })
After executing this code, reservoirA
and reservoirB
will be in identical states.
For more information on reservoir sampling, refer to [this page](https://en.wikipedia.org/wiki/Reservoir_sampling).
Note that, adding items has amortized O(1)
runtime cost.
Maximum size of each bucket in this reservoir key-value store.
Seed to use for the random number generator used while sampling.
Boolean flag indicating whether to always store the last seen item. If set to true
and the
last seen item was not sampled to be stored, then it replaces the last item in the
corresponding bucket.
Container for items coming from a stream, that implements reservoir sampling so that its size never exceeds
maxSize
.
Container for items coming from a stream, that implements reservoir sampling so that its size never exceeds
maxSize
.
Maximum size of this bucket.
Random number generator to use while sampling.
Boolean flag indicating whether to always store the last seen item. If set to true
and the
last seen item was not sampled to be stored, then it replaces the last item in this bucket.
Contains helper functions for manipulating collections.
Contains helper functions for working with ProtoBuf.