Returns the lower bound of a given percentile where the percentile is between (0,1] The items that are iterated over cannot be negative.
Returns the intersection of a bounded percentile where the percentile is between (0,1] The items that are iterated over cannot be negative.
Using a constant amount of memory, give an approximate unique count (~ 1% error).
Using a constant amount of memory, give an approximate unique count (~ 1% error). This uses an exact set for up to 100 items, then HyperLogLog (HLL) with an 1.2% standard error which uses at most 8192 bytes for each HLL. For more control, see HyperLogLogAggregator.
This is a trivial aggregator that always returns a single value
How many items satisfy a predicate
Do any items satisfy some predicate
Do all items satisfy a predicate
Using Aggregator.prepare,present you can add to this aggregator
Take the first (left most in reduce order) item found
Take the last (right most in reduce order) item found
Get the maximum item
Get the minimum item
This returns the number of items we find
Take the largest count
items using a heap
Take the smallest count
items using a heap
Put everything in a List.
Put everything in a List. Note, this could fill the memory if the List is very large.
Put everything in a Set.
Put everything in a Set. Note, this could fill the memory if the Set is very large.
This builds an in-memory Set, and then finally gets the size of that set.
This builds an in-memory Set, and then finally gets the size of that set. This may not be scalable if the Uniques are very large. You might check the approximateUniqueCount or HyperLogLog Aggregator to get an approximate version of this that is scalable.
Aggregators compose well.
To create a parallel aggregator that operates on a single input in parallel, use: GeneratedTupleAggregator.from2((agg1, agg2))