Calculates the following:
- cluster (prediction); already available in the default KMeans algorithm.
- distance to cluster
- probability
- probability by feature (dimension)
Note: The probability by feature algorithm is based on the ideas presented in https://github.com/tupol/naive-ml;
https://github.com/tupol/naive-ml/blob/master/src/main/scala/tupol/ml/clustering/KMeansGaussian.scala.
Note: The probability by feature algorithm can be rendered useless if a feature/dimension reduction algorithm
is used before applying XKMeans2, as we will be unable to track back the exact feature which contributed to a
record being classified as an anomaly.
Note: This is by far not a perfect solution yet, as the general assumption is that the data follows a
normal distribution, which is not always the case.
Annotations
@Experimental()
classXKMeansModel extends KMeansModel with XKMeansParams
Extended KMeans algorithm.
Calculates the following: - cluster (prediction); already available in the default KMeans algorithm. - distance to cluster - probability - probability by feature (dimension)
Note: The probability by feature algorithm is based on the ideas presented in https://github.com/tupol/naive-ml; https://github.com/tupol/naive-ml/blob/master/src/main/scala/tupol/ml/clustering/KMeansGaussian.scala.
Note: The probability by feature algorithm can be rendered useless if a feature/dimension reduction algorithm is used before applying XKMeans2, as we will be unable to track back the exact feature which contributed to a record being classified as an anomaly.
Note: This is by far not a perfect solution yet, as the general assumption is that the data follows a normal distribution, which is not always the case.