Maximum size of each bucket in this reservoir key-value store.
Seed to use for the random number generator used while sampling.
Boolean flag indicating whether to always store the last seen item. If set to true
and the
last seen item was not sampled to be stored, then it replaces the last item in the
corresponding bucket.
Adds a new item to the reservoir with the provided key.
Adds a new item to the reservoir with the provided key.
If the corresponding reservoir has not yet reached full size, then the new item is guaranteed to be added. If the
reservoir is full, then the behavior of this method depends on the value of alwaysKeepLast
.
If alwaysKeepLast
is set to true
, then the new item is guaranteed to be added to the reservoir, and either the
previous last item will be replaced, or (with low probability) an older item will be replaced.
If alwaysKeepLast
is set to false
, then the new item may replace an old item with low probability.
If transformFn
is provided, then it will be applied to transform the provided item (lazily, if and only if the
item is going to be included in the reservoir).
Key for the item to add.
Item to add.
Transform function for the item to add.
Boolean flag indicating whether to always store the last seen item.
Boolean flag indicating whether to always store the last seen item. If set to true
and the
last seen item was not sampled to be stored, then it replaces the last item in the
corresponding bucket.
Filters the items in this reservoir using the provided filtering function.
Filters the items in this reservoir using the provided filtering function.
When filtering items from each reservoir bucket, we must update the internal state variable numItemsSeen
, which
is used for determining the rate of replacement in reservoir sampling. Ideally, numItemsSeen
would contain the
exact number of items that have ever been seen by the add
function of this reservoir, and that satisfy the
provided filtering function. However, the reservoir bucket does not have access to all of the items it has seen --
it only has access to the subset of items that have survived sampling (i.e., _items
). Therefore, we estimate
numItemsSeen
by scaling its original value by the same ratio as the ratio of items that were not filtered out
and that are currently stored in this reservoir bucket.
Filtering function that returns true
for the items to be kept in the reservoir.
Optional key for which to filter the values. If None
(the default), then the values for all
keys in the reservoir are filtered.
Number of items filtered from this reservoir.
Returns all the items stored for the provided key and throws an exception if the key does not exist.
Returns all the keys in the reservoir.
Maximum size of each bucket in this reservoir key-value store.
Seed to use for the random number generator used while sampling.
A key-value store using deterministic reservoir sampling.
Items are added with an associated key. Items may be retrieved by the corresponding key, and a list of keys can also be retrieved. If
maxSize
is not zero, then it dictates the maximum number of items that will be stored for each key. Once there are more items for a given key, they are replaced via reservoir sampling, such that each item has an equal probability of being included in the sample.Deterministic means that for any given seed and bucket size, the sequence of values that are kept for any given key will always be the same, and that this is independent of any insertions for other keys. That is:
After executing this code,
reservoirA
andreservoirB
will be in identical states.For more information on reservoir sampling, refer to [this page](https://en.wikipedia.org/wiki/Reservoir_sampling).
Note that, adding items has amortized
O(1)
runtime cost.Maximum size of each bucket in this reservoir key-value store.
Seed to use for the random number generator used while sampling.
Boolean flag indicating whether to always store the last seen item. If set to
true
and the last seen item was not sampled to be stored, then it replaces the last item in the corresponding bucket.