Package

org.apache.spark.ml.clustering.tupol

evaluation

Permalink

package evaluation

Visibility
  1. Public
  2. All

Type Members

  1. case class ClusteringDistanceSummary(predictions: RDD[(Int, Double)]) extends Product with Serializable

    Permalink

    Calculate distance statistics, by cluster and for the entire model.

    Calculate distance statistics, by cluster and for the entire model.

    predictions

    the tuple of cluster id and distance used to compute the statistics

  2. case class ClusteringFeaturesSummary(predictions: RDD[(Int, Vector)], centroids: Array[Vector]) extends Product with Serializable

    Permalink

    Calculate distance statistics for each feature, by cluster and for the entire model.

    Calculate distance statistics for each feature, by cluster and for the entire model.

    This provides a deeper insight into the model itself than the ClusteringStats

    predictions

    the tuple of cluster id and distance vector used to compute the statistics

  3. case class DistanceStatsByFeature(count: Long, min: Vector, avg: Vector, max: Vector, variance: Vector) extends Product with Serializable

    Permalink

    Basic statistics for each dimension (minimum, mean, maximum and variance)

Ungrouped