weka.clusterers
Class FarthestFirst

java.lang.Object
  extended by weka.clusterers.AbstractClusterer
      extended by weka.clusterers.RandomizableClusterer
          extended by weka.clusterers.FarthestFirst
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, Clusterer, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class FarthestFirst
extends RandomizableClusterer
implements TechnicalInformationHandler

Cluster data using the FarthestFirst algorithm.

For more information see:

Hochbaum, Shmoys (1985). A best possible heuristic for the k-center problem. Mathematics of Operations Research. 10(2):180-184.

Sanjoy Dasgupta: Performance Guarantees for Hierarchical Clustering. In: 15th Annual Conference on Computational Learning Theory, 351-363, 2002.

Notes:
- works as a fast simple approximate clusterer
- modelled after SimpleKMeans, might be a useful initializer for it

BibTeX:

 @article{Hochbaum1985,
    author = {Hochbaum and Shmoys},
    journal = {Mathematics of Operations Research},
    number = {2},
    pages = {180-184},
    title = {A best possible heuristic for the k-center problem},
    volume = {10},
    year = {1985}
 }
 
 @inproceedings{Dasgupta2002,
    author = {Sanjoy Dasgupta},
    booktitle = {15th Annual Conference on Computational Learning Theory},
    pages = {351-363},
    publisher = {Springer},
    title = {Performance Guarantees for Hierarchical Clustering},
    year = {2002}
 }
 

Valid options are:

 -N <num>
  number of clusters. (default = 2).
 -S <num>
  Random number seed.
  (default 1)

Version:
$Revision: 5987 $
Author:
Bernhard Pfahringer ([email protected])
See Also:
RandomizableClusterer, Serialized Form

Constructor Summary
FarthestFirst()
           
 
Method Summary
 void buildClusterer(Instances data)
          Generates a clusterer.
 int clusterInstance(Instance instance)
          Classifies a given instance.
 Capabilities getCapabilities()
          Returns default capabilities of the clusterer.
 int getNumClusters()
          gets the number of clusters to generate
 java.lang.String[] getOptions()
          Gets the current settings of FarthestFirst
 java.lang.String getRevision()
          Returns the revision string.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 java.lang.String globalInfo()
          Returns a string describing this clusterer
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
 int numberOfClusters()
          Returns the number of clusters.
 java.lang.String numClustersTipText()
          Returns the tip text for this property
 void setNumClusters(int n)
          set the number of clusters to generate
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 java.lang.String toString()
          return a string describing this clusterer
 
Methods inherited from class weka.clusterers.RandomizableClusterer
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.clusterers.AbstractClusterer
distributionForInstance, forName, makeCopies, makeCopy, runClusterer
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

FarthestFirst

public FarthestFirst()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing this clusterer

Returns:
a description of the evaluator suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

getCapabilities

public Capabilities getCapabilities()
Returns default capabilities of the clusterer.

Specified by:
getCapabilities in interface Clusterer
Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class AbstractClusterer
Returns:
the capabilities of this clusterer
See Also:
Capabilities

buildClusterer

public void buildClusterer(Instances data)
                    throws java.lang.Exception
Generates a clusterer. Has to initialize all fields of the clusterer that are not being set via options.

Specified by:
buildClusterer in interface Clusterer
Specified by:
buildClusterer in class AbstractClusterer
Parameters:
data - set of instances serving as training data
Throws:
java.lang.Exception - if the clusterer has not been generated successfully

clusterInstance

public int clusterInstance(Instance instance)
                    throws java.lang.Exception
Classifies a given instance.

Specified by:
clusterInstance in interface Clusterer
Overrides:
clusterInstance in class AbstractClusterer
Parameters:
instance - the instance to be assigned to a cluster
Returns:
the number of the assigned cluster as an integer if the class is enumerated, otherwise the predicted value
Throws:
java.lang.Exception - if instance could not be classified successfully

numberOfClusters

public int numberOfClusters()
                     throws java.lang.Exception
Returns the number of clusters.

Specified by:
numberOfClusters in interface Clusterer
Specified by:
numberOfClusters in class AbstractClusterer
Returns:
the number of clusters generated for a training dataset.
Throws:
java.lang.Exception - if number of clusters could not be returned successfully

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class RandomizableClusterer
Returns:
an enumeration of all the available options.

numClustersTipText

public java.lang.String numClustersTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumClusters

public void setNumClusters(int n)
                    throws java.lang.Exception
set the number of clusters to generate

Parameters:
n - the number of clusters to generate
Throws:
java.lang.Exception - if number of clusters is negative

getNumClusters

public int getNumClusters()
gets the number of clusters to generate

Returns:
the number of clusters to generate

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -N <num>
  number of clusters. (default = 2).
 -S <num>
  Random number seed.
  (default 1)

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class RandomizableClusterer
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of FarthestFirst

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class RandomizableClusterer
Returns:
an array of strings suitable for passing to setOptions()

toString

public java.lang.String toString()
return a string describing this clusterer

Overrides:
toString in class java.lang.Object
Returns:
a description of the clusterer as a string

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class AbstractClusterer
Returns:
the revision

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - should contain the following arguments:

-t training file [-N number of clusters]