Class Histogram


  • public class Histogram
    extends Object
    This class is used to encapsulate a Histogram to provide Histogram data. If the data fits in the cardinality set then it simply uses a map to generate the histogram values. Once the cardinality exceeds maxCardinality then the data is tracked using an algorithm based on Yael Ben-Haim and Elad Tom-Tov, "A streaming parallel decision tree algorithm", J. Machine Learning Research 11 (2010), pp. 849--872 All data is stored in the Cardinality Map until this is exhausted at this point we start to populate (via accept) the underlying Histogram Sketch with all values not captured in the Cardinality Map. Once we need to generate a Histogram we either just generate it from the Cardinality Map or if the MaxCardinality has been exceeded we add all the entries captured in the Cardinality Map to the Sketch.
    • Method Detail

      • setCardinality

        public void setCardinality​(Map<String,​Long> map)
      • setCardinalityOverflow

        public void setCardinalityOverflow​(HistogramSPDT histogramSPDT)
      • getHistogram

        public Histogram.Entry[] getHistogram​(int buckets)
        Get the histogram with the supplied number of buckets
        Parameters:
        buckets - the number of buckets in the Histogram
        Returns:
        An array of length 'buckets' that constitutes the Histogram (or null if cardinality is zero).