Bucketizer
Transform a column of continuous features to n columns of feature buckets.
With n+1 splits, there are n buckets. A bucket defined by splits x,y holds values in the range
[x,y) except the last bucket, which also includes y. Splits should be strictly increasing.
Values at -inf, inf must be explicitly provided to cover all double values; Otherwise,
FeatureRejection.OutOfBound rejection will be reported for values outside the splits
specified.. Two examples of splits are
[x,y) except the last bucket, which also includes y. Splits should be strictly increasing.
Values at -inf, inf must be explicitly provided to cover all double values; Otherwise,
FeatureRejection.OutOfBound rejection will be reported for values outside the splits
specified.. Two examples of splits are
Array(Double.NegativeInfinity, 0.0, 1.0, Double.PositiveInfinity)
and Array(0.0, 1.0, 2.0)
.Note that if you have no idea of the upper and lower bounds of the targeted column, you should
add
prevent a potential FeatureRejection.OutOfBound rejection.
add
Double.NegativeInfinity
and Double.PositiveInfinity
as the bounds of your splits toprevent a potential FeatureRejection.OutOfBound rejection.
Note also that the splits that you provided have to be in strictly increasing order, i.e.
s0 < s1 < s2 < ... < sn
.Missing values are transformed to zero vectors.