T
- The type of object held in the sketch.public final class VarOptItemsSketch<T> extends Object
Using this sketch with uniformly constant item weights (e.g. 1.0) will produce a standard reservoir sample over the steam.
Modifier and Type | Method and Description |
---|---|
SampleSubsetSummary |
estimateSubsetSum(Predicate<T> predicate)
Computes an estimated subset sum from the entire stream for objects matching a given
predicate.
|
int |
getK()
Returns the sketch's value of k, the maximum number of samples stored in the
sketch.
|
long |
getN()
Returns the number of items processed from the input stream
|
int |
getNumSamples()
Returns the current number of items in the sketch, which may be smaller than the
sketch capacity.
|
VarOptItemsSamples<T> |
getSketchSamples()
Gets a result iterator object.
|
static <T> VarOptItemsSketch<T> |
heapify(org.apache.datasketches.memory.Memory srcMem,
ArrayOfItemsSerDe<T> serDe)
Returns a sketch instance of this class from the given srcMem,
which must be a Memory representation of this sketch class.
|
static <T> VarOptItemsSketch<T> |
newInstance(int k)
Construct a varopt sampling sketch with up to k samples using the default resize factor (8).
|
static <T> VarOptItemsSketch<T> |
newInstance(int k,
ResizeFactor rf)
Construct a varopt sampling sketch with up to k samples using the specified resize factor.
|
void |
reset()
Resets this sketch to the empty state, but retains the original value of k.
|
byte[] |
toByteArray(ArrayOfItemsSerDe<? super T> serDe)
Returns a byte array representation of this sketch.
|
byte[] |
toByteArray(ArrayOfItemsSerDe<? super T> serDe,
Class<?> clazz)
Returns a byte array representation of this sketch.
|
String |
toString()
Returns a human-readable summary of the sketch.
|
static String |
toString(byte[] byteArr)
Returns a human readable string of the preamble of a byte array image of a VarOptItemsSketch.
|
static String |
toString(org.apache.datasketches.memory.Memory mem)
Returns a human readable string of the preamble of a Memory image of a VarOptItemsSketch.
|
void |
update(T item,
double weight)
Randomly decide whether or not to include an item in the sample set.
|
public static <T> VarOptItemsSketch<T> newInstance(int k)
T
- The type of object held in the sketch.k
- Maximum size of sampling. Allocated size may be smaller until sketch fills.
Unlike many sketches in this package, this value does not need to be a
power of 2.public static <T> VarOptItemsSketch<T> newInstance(int k, ResizeFactor rf)
T
- The type of object held in the sketch.k
- Maximum size of sampling. Allocated size may be smaller until sketch fills.
Unlike many sketches in this package, this value does not need to be a
power of 2. The maximum size is Integer.MAX_VALUE-1.rf
- See Resize Factorpublic static <T> VarOptItemsSketch<T> heapify(org.apache.datasketches.memory.Memory srcMem, ArrayOfItemsSerDe<T> serDe)
T
- The type of item this sketch containssrcMem
- a Memory representation of a sketch of this class.
See MemoryserDe
- An instance of ArrayOfItemsSerDepublic int getK()
public long getN()
public int getNumSamples()
public VarOptItemsSamples<T> getSketchSamples()
public void update(T item, double weight)
item
- an item of the set being sampled fromweight
- a strictly positive weight associated with the itempublic void reset()
public String toString()
public static String toString(byte[] byteArr)
byteArr
- the given byte arraypublic static String toString(org.apache.datasketches.memory.Memory mem)
mem
- the given Memorypublic byte[] toByteArray(ArrayOfItemsSerDe<? super T> serDe)
serDe
- An instance of ArrayOfItemsSerDepublic byte[] toByteArray(ArrayOfItemsSerDe<? super T> serDe, Class<?> clazz)
serDe
- An instance of ArrayOfItemsSerDeclazz
- The class represented by <T>public SampleSubsetSummary estimateSubsetSum(Predicate<T> predicate)
This is technically a heuristic method, and tries to err on the conservative side.
predicate
- A predicate to use when identifying items.Copyright © 2015–2022 The Apache Software Foundation. All rights reserved.