This is a helper class to generate an append-only vectorized hash map that can act as a 'cache'
for extremely fast key-value lookups while evaluating aggregates (and fall back to the
BytesToBytesMap if a given key isn't found). This is 'codegened' in TungstenAggregate to speed
up aggregates w/ key.
It is backed by a power-of-2-sized array for index lookups and a columnar batch that stores the
key-value pairs. The index lookups in the array rely on linear probing (with a small number of
maximum tries) and use an inexpensive hash function which makes it really efficient for a
majority of lookups. However, using linear probing and an inexpensive hash function also makes it
less robust as compared to the BytesToBytesMap (especially for a large number of keys or even
for certain distribution of keys) and requires us to fall back on the latter for correctness. We
also use a secondary columnar batch that logically projects over the original columnar batch and
is equivalent to the BytesToBytesMap aggregate buffer.
NOTE: This vectorized hash map currently doesn't support nullable keys and falls back to the
BytesToBytesMap to store them.
This is a helper class to generate an append-only vectorized hash map that can act as a 'cache' for extremely fast key-value lookups while evaluating aggregates (and fall back to the
BytesToBytesMap
if a given key isn't found). This is 'codegened' in TungstenAggregate to speed up aggregates w/ key.It is backed by a power-of-2-sized array for index lookups and a columnar batch that stores the key-value pairs. The index lookups in the array rely on linear probing (with a small number of maximum tries) and use an inexpensive hash function which makes it really efficient for a majority of lookups. However, using linear probing and an inexpensive hash function also makes it less robust as compared to the
BytesToBytesMap
(especially for a large number of keys or even for certain distribution of keys) and requires us to fall back on the latter for correctness. We also use a secondary columnar batch that logically projects over the original columnar batch and is equivalent to theBytesToBytesMap
aggregate buffer.NOTE: This vectorized hash map currently doesn't support nullable keys and falls back to the
BytesToBytesMap
to store them.