public class KMeansClustering<EV,Message> extends Object implements ComputeFunction<Long,KMeansVertexValue,EV,Message>
N
data points (observations) into k
clusters.
The input consists of data points with a pointID
and a vector of coordinates.
The algorithm is iterative and works as follows.
In the initialization phase, k
clusters are chosen from the input points at random.
In each iteration: 1. each data point is assigned to the cluster center which is closest to it, by means of euclidean distance 2. new cluster centers are recomputed, by calculating the arithmetic mean of the assigned points
Convergence is reached when the positions of the cluster centers do not change.
http://en.wikipedia.org/wiki/K-means_clustering
Modifier and Type | Class and Description |
---|---|
class |
KMeansClustering.RandomCentersInitialization |
ComputeFunction.Aggregators, ComputeFunction.Callback<K,VV,EV,Message>, ComputeFunction.InitCallback, ComputeFunction.MasterCallback, ComputeFunction.ReadAggregators, ComputeFunction.ReadWriteAggregators
Modifier and Type | Field and Description |
---|---|
static String |
ASSIGNED_POINTS_PREFIX
The prefix for the aggregators used to store the number of points
assigned to each cluster center
|
static String |
CENTER_AGGR_PREFIX
The prefix for the cluster center aggregators, used to store the cluster centers coordinates.
|
static String |
CLUSTER_CENTERS_COUNT
Number of cluster centers
|
static int |
CLUSTER_CENTERS_COUNT_DEFAULT
Default number of cluster centers
|
static String |
DIMENSIONS
Dimensions of the input points
|
static String |
INITIAL_CENTERS
The initial centers aggregator
|
static int |
ITERATIONS_DEFAULT
Default value for iterations
|
static String |
MAX_ITERATIONS
Maximum number of iterations
|
static String |
POINTS_COUNT
Total number of input points
|
static String |
PRINT_FINAL_CENTERS
Parameter that enables printing the final centers coordinates
|
static boolean |
PRINT_FINAL_CENTERS_DEFAULT
False by default
|
static String |
TEST_INITIAL_CENTERS
False by default
|
Constructor and Description |
---|
KMeansClustering() |
Modifier and Type | Method and Description |
---|---|
void |
compute(int superstep,
VertexWithValue<Long,KMeansVertexValue> vertex,
Iterable<Message> messages,
Iterable<EdgeWithValue<Long,EV>> edges,
ComputeFunction.Callback<Long,KMeansVertexValue,EV,Message> cb)
The function for computing a new vertex value or sending messages to the next superstep.
|
void |
init(Map<String,?> configs,
ComputeFunction.InitCallback cb)
Initialize the ComputeFunction, this is the place to register aggregators.
|
void |
masterCompute(int superstep,
ComputeFunction.MasterCallback cb)
A function for performing sequential computations between supersteps.
|
void |
superstepCompute(int superstep,
VertexWithValue<Long,KMeansVertexValue> vertex,
Iterable<Message> messages,
Iterable<EdgeWithValue<Long,EV>> edges,
ComputeFunction.Callback<Long,KMeansVertexValue,EV,Message> cb)
Main K-means clustering compute method.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
postSuperstep, preSuperstep
public static final String CENTER_AGGR_PREFIX
public static final String ASSIGNED_POINTS_PREFIX
public static final String INITIAL_CENTERS
public static final String MAX_ITERATIONS
public static final int ITERATIONS_DEFAULT
public static final String CLUSTER_CENTERS_COUNT
public static final int CLUSTER_CENTERS_COUNT_DEFAULT
public static final String DIMENSIONS
public static final String POINTS_COUNT
public static final String PRINT_FINAL_CENTERS
public static final boolean PRINT_FINAL_CENTERS_DEFAULT
public static final String TEST_INITIAL_CENTERS
public void superstepCompute(int superstep, VertexWithValue<Long,KMeansVertexValue> vertex, Iterable<Message> messages, Iterable<EdgeWithValue<Long,EV>> edges, ComputeFunction.Callback<Long,KMeansVertexValue,EV,Message> cb)
superstep
- the count of the current superstepvertex
- the current vertex with its valuemessages
- a Map of the source vertex and the message sent from the previous superstepedges
- the adjacent edges with their valuescb
- a callback for setting a new vertex value or sending messages to the next supersteppublic final void init(Map<String,?> configs, ComputeFunction.InitCallback cb)
ComputeFunction
init
in interface ComputeFunction<Long,KMeansVertexValue,EV,Message>
configs
- configuration parameterscb
- a callback for registering aggregatorspublic final void masterCompute(int superstep, ComputeFunction.MasterCallback cb)
ComputeFunction
masterCompute
in interface ComputeFunction<Long,KMeansVertexValue,EV,Message>
superstep
- the superstepcb
- a callback for writing to aggregators or halting the computationpublic void compute(int superstep, VertexWithValue<Long,KMeansVertexValue> vertex, Iterable<Message> messages, Iterable<EdgeWithValue<Long,EV>> edges, ComputeFunction.Callback<Long,KMeansVertexValue,EV,Message> cb)
ComputeFunction
compute
in interface ComputeFunction<Long,KMeansVertexValue,EV,Message>
superstep
- the count of the current superstepvertex
- the current vertex with its valuemessages
- a Map of the source vertex and the message sent from the previous superstepedges
- the adjacent edges with their valuescb
- a callback for setting a new vertex value or sending messages to the next superstepCopyright © 2020. All rights reserved.