Package

org.apache.spark.ml.clustering

tupol

Permalink

package tupol

Visibility

Public
All

Type Members

class XKMeans extends KMeans with XKMeansParams

Extended KMeans algorithm.
Extended KMeans algorithm.
Calculates the following: - cluster (prediction); already available in the default KMeans algorithm. - distance to cluster - probability - probability by feature (dimension)
Note: The probability by feature algorithm is based on the ideas presented in https://github.com/tupol/naive-ml; https://github.com/tupol/naive-ml/blob/master/src/main/scala/tupol/ml/clustering/KMeansGaussian.scala.
Note: The probability by feature algorithm can be rendered useless if a feature/dimension reduction algorithm is used before applying XKMeans2, as we will be unable to track back the exact feature which contributed to a record being classified as an anomaly.
Note: This is by far not a perfect solution yet, as the general assumption is that the data follows a normal distribution, which is not always the case.

Annotations
@Experimental()
class XKMeansModel extends KMeansModel with XKMeansParams

Value Members

object XKMeansModel extends Serializable
object XKMeansReporting

Defines a set of reports generated from a PipelineModel with XKMeansModel and proper feature names.
package evaluation
package implicits
object vectorops

Additional operations for linalg.Vector

Ungrouped