Transform a collection of categorical features to binary columns, with at most a single
one-value. Similar to OneHotEncoder but uses MurmursHash3 to hash features into buckets to
reduce CPU and memory overhead.
Missing values are transformed to [0.0, 0.0, ...].
If hashBucketSize is inferred with HLL, the estimate is scaled by sizeScalingFactor to reduce
the number of collisions.
Rough table of relationship of scaling factor to % collisions, measured from a corpus of 466544
English words:
Transform a collection of categorical features to binary columns, with at most a single one-value. Similar to OneHotEncoder but uses MurmursHash3 to hash features into buckets to reduce CPU and memory overhead.
Missing values are transformed to [0.0, 0.0, ...].
If hashBucketSize is inferred with HLL, the estimate is scaled by sizeScalingFactor to reduce the number of collisions.
Rough table of relationship of scaling factor to % collisions, measured from a corpus of 466544 English words: