Package com.cobber.fta
Class Histogram
- Object
-
- Histogram
-
public class Histogram extends Object
This class is used to encapsulate a Histogram to provide Histogram data. If the data fits in the cardinality set then it simply uses a map to generate the histogram values. Once the cardinality exceeds maxCardinality then the data is tracked using an algorithm based on Yael Ben-Haim and Elad Tom-Tov, "A streaming parallel decision tree algorithm", J. Machine Learning Research 11 (2010), pp. 849--872 All data is stored in the Cardinality Map until this is exhausted at this point we start to populate (via accept) the underlying Histogram Sketch with all values not captured in the Cardinality Map. Once we need to generate a Histogram we either just generate it from the Cardinality Map or if the MaxCardinality has been exceeded we add all the entries captured in the Cardinality Map to the Sketch.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description class
Histogram.Entry
A Histogram Entry captures the low and high bounds for each bucket along with the number of entries in the bucket.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static Histogram.Entry
getBucket(Histogram.Entry[] buckets, double value)
Given a value and set of buckets - locate the bucket holding this value.Histogram.Entry[]
getHistogram(int buckets)
Get the histogram with the supplied number of bucketsHistogram
merge(Histogram other)
void
setCardinality(Map<String,Long> map)
void
setCardinalityOverflow(HistogramSPDT histogramSPDT)
static void
tagClusters(Histogram.Entry[] buckets)
Given a Histogram analysis mark each bucket as part of a cluster and then attach the count and percent for the cluster to all buckets in the cluster.
-
-
-
Method Detail
-
setCardinality
public void setCardinality(Map<String,Long> map)
-
setCardinalityOverflow
public void setCardinalityOverflow(HistogramSPDT histogramSPDT)
-
getHistogram
public Histogram.Entry[] getHistogram(int buckets)
Get the histogram with the supplied number of buckets- Parameters:
buckets
- the number of buckets in the Histogram- Returns:
- An array of length 'buckets' that constitutes the Histogram (or null if cardinality is zero).
-
tagClusters
public static void tagClusters(Histogram.Entry[] buckets)
Given a Histogram analysis mark each bucket as part of a cluster and then attach the count and percent for the cluster to all buckets in the cluster. For example, with the following distribution: 1, 1, 0, 0, 0, 0, 10, 20, 30, 30, 8 We would declare two clusters - the first one having 2% and the second having 98%, so the percentages would look as follows: 2, 2, 0, 0, 0, 0, 98, 98, 98, 98, 98- Parameters:
buckets
- The set of Histogram buckets for this analysis
-
getBucket
public static Histogram.Entry getBucket(Histogram.Entry[] buckets, double value)
Given a value and set of buckets - locate the bucket holding this value.- Parameters:
buckets
- The set of Histogram buckets for this analysisvalue
- The value we are searching for- Returns:
- The bucket containing the supplied value.
-
-