Illustrates the use of the HyperLogLog algorithm, from Twitter's Algebird library, to compute
a windowed and global estimate of the unique user IDs occurring in a Twitter stream.
This
blog post and this
blog post
have good overviews of HyperLogLog (HLL). HLL is a memory-efficient datastructure for
estimating the cardinality of a data stream, i.e. the number of unique elements.
Algebird's implementation is a monoid, so we can succinctly merge two HLL instances in the
reduce operation.
Illustrates the use of the HyperLogLog algorithm, from Twitter's Algebird library, to compute a windowed and global estimate of the unique user IDs occurring in a Twitter stream.
This blog post and this blog post have good overviews of HyperLogLog (HLL). HLL is a memory-efficient datastructure for estimating the cardinality of a data stream, i.e. the number of unique elements.
Algebird's implementation is a monoid, so we can succinctly merge two HLL instances in the reduce operation.