core/com.spotify.featran.transformers/Bucketizer

Bucketizer

Transform a column of continuous features to n columns of feature buckets.

With n+1 splits, there are n buckets. A bucket defined by splits x,y holds values in the range
[x,y) except the last bucket, which also includes y. Splits should be strictly increasing.
Values at -inf, inf must be explicitly provided to cover all double values; Otherwise,
FeatureRejection.OutOfBound rejection will be reported for values outside the splits
specified.. Two examples of splits are
Array(Double.NegativeInfinity, 0.0, 1.0, Double.PositiveInfinity) and Array(0.0, 1.0, 2.0).

Note that if you have no idea of the upper and lower bounds of the targeted column, you should
add Double.NegativeInfinity and Double.PositiveInfinity as the bounds of your splits to
prevent a potential FeatureRejection.OutOfBound rejection.

Note also that the splits that you provided have to be in strictly increasing order, i.e.
s0 < s1 < s2 < ... < sn.

Missing values are transformed to zero vectors.

trait SettingsBuilder

class Object

trait Matchable

class Any

Value members

Methods

Create a new Bucketizer instance.

Value Params

splits: parameter for mapping continuous features into buckets

Create a new Bucketizer from a settings object

Value Params

setting: Settings object