Maximum number of counters to keep (parameter "m" in the research paper).
Map of item to counter, where each counter consists of an observed count and possible over-estimation (error)
Current lowest value for count
Check consistency with other SpaceSaver, useful for testing.
Check consistency with other SpaceSaver, useful for testing. Returns boolean indicating if they are consistent
returns the frequency estimate for the item
Get the elements that show up more than thres times.
Get the elements that show up more than thres times. Returns sorted in descending order: (item, Approximate[Long], guaranteed)
Get the top-k elements.
Get the top-k elements. Returns sorted in descending order: (item, Approximate[Long], guaranteed)
Data structure used in the Space-Saving Algorithm to find the approximate most frequent and top-k elements. The algorithm is described in "Efficient Computation of Frequent and Top-k Elements in Data Streams". See here: www.cs.ucsb.edu/research/tech_reports/reports/2005-23.pdf In the paper the data structure is called StreamSummary but we chose to call it SpaceSaver instead. Note that the adaptation to hadoop and parallelization were not described in the article and have not been proven to be mathematically correct or preserve the guarantees or benefits of the algorithm.