public abstract class PartitionClustering
extends java.lang.Object
implements java.io.Serializable
Modifier and Type | Field and Description |
---|---|
int |
k
The number of clusters.
|
static int |
OUTLIER
Cluster label for outliers or noises.
|
int[] |
size
The number of observations in each cluster.
|
int[] |
y
The cluster labels of data.
|
Constructor and Description |
---|
PartitionClustering(int k,
int[] y)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static <T extends PartitionClustering & java.lang.Comparable<? super T>> |
run(int runs,
java.util.function.Supplier<T> clustering)
Runs a clustering algorithm multiple times and return the best one
(e.g.
|
static <T> double[] |
seed(T[] data,
T[] medoids,
int[] y,
java.util.function.ToDoubleBiFunction<T,T> distance)
Initialize cluster membership of input objects with K-Means++ algorithm.
|
java.lang.String |
toString() |
public static final int OUTLIER
public final int k
public final int[] y
public final int[] size
public PartitionClustering(int k, int[] y)
k
- the number of clusters.y
- the cluster labels.public java.lang.String toString()
toString
in class java.lang.Object
public static <T> double[] seed(T[] data, T[] medoids, int[] y, java.util.function.ToDoubleBiFunction<T,T> distance)
K-Means++ is based on the intuition of spreading the k initial cluster centers away from each other. The first cluster center is chosen uniformly at random from the data points that are being clustered, after which each subsequent cluster center is chosen from the remaining data points with probability proportional to its distance squared to the point's closest cluster center.
The exact algorithm is as follows:
T
- the type of input object.data
- data objects array of size n.medoids
- an array of size k to store cluster medoids on output.y
- an array of size n to store cluster labels on output.public static <T extends PartitionClustering & java.lang.Comparable<? super T>> T run(int runs, java.util.function.Supplier<T> clustering)
runs
- the number of runs.