public final class ReqSketch extends Object
Reference: https://arxiv.org/abs/2004.01668
This implementation differs from the algorithm described in the paper in the following:
This implementation provides a number of capabilities not discussed in the paper or provided in the Python prototype.
QuantilesAPI
QuantilesFloatsAPI.FloatsPartitionBoundaries
Modifier and Type | Method and Description |
---|---|
static ReqSketchBuilder |
builder()
Returns a new ReqSketchBuilder
|
double[] |
getCDF(float[] splitPoints,
QuantileSearchCriteria searchCrit)
Returns an approximation to the Cumulative Distribution Function (CDF) of the input stream
as a monotonically increasing array of double ranks (or cumulative probabilities) on the interval [0.0, 1.0],
given a set of splitPoints.
|
boolean |
getHighRankAccuracyMode()
If true, the high ranks are prioritized for better accuracy.
|
int |
getK()
Gets the user configured parameter k, which controls the accuracy of the sketch
and its memory space usage.
|
float |
getMaxItem()
Returns the maximum item of the stream.
|
float |
getMinItem()
Returns the minimum item of the stream.
|
long |
getN()
Gets the length of the input stream.
|
int |
getNumRetained()
Gets the number of quantiles retained by the sketch.
|
QuantilesFloatsAPI.FloatsPartitionBoundaries |
getPartitionBoundaries(int numEquallyWeighted,
QuantileSearchCriteria searchCrit)
This method returns an instance of
FloatsPartitionBoundaries which provides
sufficient information for the user to create the given number of equally weighted partitions. |
double[] |
getPMF(float[] splitPoints,
QuantileSearchCriteria searchCrit)
Returns an approximation to the Probability Mass Function (PMF) of the input stream
as an array of probability masses as doubles on the interval [0.0, 1.0],
given a set of splitPoints.
|
float |
getQuantile(double normRank,
QuantileSearchCriteria searchCrit)
Gets the approximate quantile of the given normalized rank and the given search criterion.
|
float |
getQuantileLowerBound(double rank)
Gets the lower bound of the quantile confidence interval in which the quantile of the
given rank exists.
|
float |
getQuantileLowerBound(double rank,
int numStdDev) |
float[] |
getQuantiles(double[] normRanks,
QuantileSearchCriteria searchCrit)
Gets an array of quantiles from the given array of normalized ranks.
|
float |
getQuantileUpperBound(double rank)
Gets the upper bound of the quantile confidence interval in which the true quantile of the
given rank exists.
|
float |
getQuantileUpperBound(double rank,
int numStdDev) |
double |
getRank(float quantile,
QuantileSearchCriteria searchCrit)
Gets the normalized rank corresponding to the given a quantile.
|
double |
getRankLowerBound(double rank)
Gets the lower bound of the rank confidence interval in which the true rank of the
given rank exists.
|
double |
getRankLowerBound(double rank,
int numStdDev)
Gets an approximate lower bound rank of the given normalized rank.
|
double[] |
getRanks(float[] quantiles,
QuantileSearchCriteria searchCrit)
Gets an array of normalized ranks corresponding to the given array of quantiles and the given
search criterion.
|
double |
getRankUpperBound(double rank)
Gets the upper bound of the rank confidence interval in which the true rank of the
given rank exists.
|
double |
getRankUpperBound(double rank,
int numStdDev)
Gets an approximate upper bound rank of the given rank.
|
static double |
getRSE(int k,
double rank,
boolean hra,
long totalN)
Returns an a priori estimate of relative standard error (RSE, expressed as a number in [0,1]).
|
int |
getSerializedSizeBytes()
Returns the current number of bytes this Sketch would require if serialized.
|
FloatsSortedView |
getSortedView()
Gets the sorted view of this sketch
|
boolean |
hasMemory()
Returns true if this sketch's data structure is backed by Memory or WritableMemory.
|
static ReqSketch |
heapify(org.apache.datasketches.memory.Memory mem)
Returns an ReqSketch on the heap from a Memory image of the sketch.
|
boolean |
isDirect()
Returns true if this sketch's data structure is off-heap (a.k.a., Direct or Native memory).
|
boolean |
isEmpty()
Returns true if this sketch is empty.
|
boolean |
isEstimationMode()
Returns true if this sketch is in estimation mode.
|
boolean |
isReadOnly()
Returns true if this sketch is read only.
|
QuantilesFloatsSketchIterator |
iterator()
Gets the iterator for this sketch, which is not sorted.
|
ReqSketch |
merge(ReqSketch other)
Merge other sketch into this one.
|
void |
reset()
Resets this sketch to the empty state.
|
byte[] |
toByteArray()
Returns a byte array representation of this sketch.
|
String |
toString()
Returns a summary of the key parameters of the sketch.
|
void |
update(float item)
Updates this sketch with the given item.
|
String |
viewCompactorDetail(String fmt,
boolean allData)
A detailed, human readable view of the sketch compactors and their data.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
getCDF, getPartitionBoundaries, getPMF, getQuantile, getQuantiles, getRank, getRanks
public static final ReqSketchBuilder builder()
public static ReqSketch heapify(org.apache.datasketches.memory.Memory mem)
mem
- The Memory object holding a valid image of an ReqSketchpublic int getK()
QuantilesAPI
getK
in interface QuantilesAPI
public double[] getCDF(float[] splitPoints, QuantileSearchCriteria searchCrit)
QuantilesFloatsAPI
The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(false) function.
getCDF
in interface QuantilesFloatsAPI
splitPoints
- an array of m unique, monotonically increasing items
(of the same type as the input items)
that divide the item input domain into m+1 overlapping intervals.
The start of each interval is below the lowest item retained by the sketch corresponding to a zero rank or zero probability, and the end of the interval is the rank or cumulative probability corresponding to the split point.
The (m+1)th interval represents 100% of the distribution represented by the sketch and consistent with the definition of a cumulative probability distribution, thus the (m+1)th rank or probability in the returned array is always 1.0.
If a split point exactly equals a retained item of the sketch and the search criterion is:
It is not recommended to include either the minimum or maximum items of the input stream.
searchCrit
- the desired search criteria.public boolean getHighRankAccuracyMode()
public float getMaxItem()
QuantilesFloatsAPI
getMaxItem
in interface QuantilesFloatsAPI
public float getMinItem()
QuantilesFloatsAPI
getMinItem
in interface QuantilesFloatsAPI
public long getN()
QuantilesAPI
getN
in interface QuantilesAPI
public double[] getPMF(float[] splitPoints, QuantileSearchCriteria searchCrit)
QuantilesFloatsAPI
The resulting approximations have a probabilistic guarantee that can be obtained from the getNormalizedRankError(true) function.
getPMF
in interface QuantilesFloatsAPI
splitPoints
- an array of m unique, monotonically increasing items
(of the same type as the input items)
that divide the item input domain into m+1 consecutive, non-overlapping intervals.
Each interval except for the end intervals starts with a split point and ends with the next split point in sequence.
The first interval starts below the lowest item retained by the sketch corresponding to a zero rank or zero probability, and ends with the first split point
The last (m+1)th interval starts with the last split point and ends after the last item retained by the sketch corresponding to a rank or probability of 1.0.
The sum of the probability masses of all (m+1) intervals is 1.0.
If the search criterion is:
It is not recommended to include either the minimum or maximum items of the input stream.
searchCrit
- the desired search criteria.public float getQuantile(double normRank, QuantileSearchCriteria searchCrit)
QuantilesFloatsAPI
getQuantile
in interface QuantilesFloatsAPI
normRank
- the given normalized rank, a double in the range [0.0, 1.0].searchCrit
- If INCLUSIVE, the given rank includes all quantiles ≤
the quantile directly corresponding to the given rank.
If EXCLUSIVE, he given rank includes all quantiles <
the quantile directly corresponding to the given rank.QuantileSearchCriteria
public float[] getQuantiles(double[] normRanks, QuantileSearchCriteria searchCrit)
QuantilesFloatsAPI
getQuantiles
in interface QuantilesFloatsAPI
normRanks
- the given array of normalized ranks, each of which must be
in the interval [0.0,1.0].searchCrit
- if INCLUSIVE, the given ranks include all quantiles ≤
the quantile directly corresponding to each rank.QuantileSearchCriteria
public float getQuantileLowerBound(double rank)
Although it is possible to estimate the probablity that the true quantile exists within the quantile confidence interval specified by the upper and lower quantile bounds, it is not possible to guarantee the width of the quantile confidence interval as an additive or multiplicative percent of the true quantile.
The approximate probability that the true quantile is within the confidence interval specified by the upper and lower quantile bounds for this sketch is 0.95.getQuantileLowerBound
in interface QuantilesFloatsAPI
rank
- the given normalized rankpublic float getQuantileLowerBound(double rank, int numStdDev)
public float getQuantileUpperBound(double rank)
Although it is possible to estimate the probablity that the true quantile exists within the quantile confidence interval specified by the upper and lower quantile bounds, it is not possible to guarantee the width of the quantile interval as an additive or multiplicative percent of the true quantile.
The approximate probability that the true quantile is within the confidence interval specified by the upper and lower quantile bounds for this sketch is 0.95.getQuantileUpperBound
in interface QuantilesFloatsAPI
rank
- the given normalized rankpublic float getQuantileUpperBound(double rank, int numStdDev)
public double getRank(float quantile, QuantileSearchCriteria searchCrit)
QuantilesFloatsAPI
getRank
in interface QuantilesFloatsAPI
quantile
- the given quantilesearchCrit
- if INCLUSIVE the given quantile is included into the rank.QuantileSearchCriteria
public double getRankLowerBound(double rank)
rank
- the given normalized rank.public double getRankLowerBound(double rank, int numStdDev)
rank
- the given rank, a number between 0 and 1.0.numStdDev
- the number of standard deviations. Must be 1, 2, or 3.public double[] getRanks(float[] quantiles, QuantileSearchCriteria searchCrit)
QuantilesFloatsAPI
getRanks
in interface QuantilesFloatsAPI
quantiles
- the given array of quantilessearchCrit
- if INCLUSIVE, the given quantiles include the rank directly corresponding to each quantile.QuantileSearchCriteria
public double getRankUpperBound(double rank)
rank
- the given normalized rank.public double getRankUpperBound(double rank, int numStdDev)
rank
- the given rank, a number between 0 and 1.0.numStdDev
- the number of standard deviations. Must be 1, 2, or 3.public int getNumRetained()
QuantilesAPI
getNumRetained
in interface QuantilesAPI
public int getSerializedSizeBytes()
QuantilesFloatsAPI
getSerializedSizeBytes
in interface QuantilesFloatsAPI
public FloatsSortedView getSortedView()
QuantilesFloatsAPI
getSortedView
in interface QuantilesFloatsAPI
public boolean isEmpty()
QuantilesAPI
isEmpty
in interface QuantilesAPI
public boolean isEstimationMode()
QuantilesAPI
isEstimationMode
in interface QuantilesAPI
public QuantilesFloatsSketchIterator iterator()
QuantilesFloatsAPI
iterator
in interface QuantilesFloatsAPI
public ReqSketch merge(ReqSketch other)
other
- sketch to be merged into this one.public void reset()
The parameter k will not change.
The parameters k, highRankAccuracy, and reqDebug will not change.
reset
in interface QuantilesAPI
public byte[] toByteArray()
QuantilesFloatsAPI
toByteArray
in interface QuantilesFloatsAPI
public String toString()
QuantilesAPI
toString
in interface QuantilesAPI
public void update(float item)
QuantilesFloatsAPI
update
in interface QuantilesFloatsAPI
item
- from a stream of quantiles. NaNs are ignored.public String viewCompactorDetail(String fmt, boolean allData)
fmt
- the format string for the quantiles; example: "%4.0f".allData
- all the retained quantiles for the sketch will be output by
compactor level. Otherwise, just a summary will be output.public QuantilesFloatsAPI.FloatsPartitionBoundaries getPartitionBoundaries(int numEquallyWeighted, QuantileSearchCriteria searchCrit)
QuantilesFloatsAPI
FloatsPartitionBoundaries
which provides
sufficient information for the user to create the given number of equally weighted partitions.getPartitionBoundaries
in interface QuantilesFloatsAPI
numEquallyWeighted
- an integer that specifies the number of equally weighted partitions between
getMinItem()
and getMaxItem()
.
This must be a positive integer greater than zero.
searchCrit
- If INCLUSIVE, all the returned quantiles are the upper boundaries of the equally weighted partitions
with the exception of the lowest returned quantile, which is the lowest boundary of the lowest ranked partition.
If EXCLUSIVE, all the returned quantiles are the lower boundaries of the equally weighted partitions
with the exception of the highest returned quantile, which is the upper boundary of the highest ranked partition.FloatsPartitionBoundaries
.public static double getRSE(int k, double rank, boolean hra, long totalN)
k
- the given size of krank
- the given normalized rank, a number in [0,1].hra
- if true High Rank Accuracy mode is being selected, otherwise, Low Rank Accuracy.totalN
- an estimate of the total number of items submitted to the sketch.public boolean hasMemory()
QuantilesAPI
hasMemory
in interface QuantilesAPI
public boolean isDirect()
QuantilesAPI
isDirect
in interface QuantilesAPI
public boolean isReadOnly()
QuantilesAPI
isReadOnly
in interface QuantilesAPI
Copyright © 2015–2022 The Apache Software Foundation. All rights reserved.