weka.classifiers.meta
Class MetaCost

java.lang.Object
  extended by weka.classifiers.Classifier
      extended by weka.classifiers.SingleClassifierEnhancer
          extended by weka.classifiers.RandomizableSingleClassifierEnhancer
              extended by weka.classifiers.meta.MetaCost
All Implemented Interfaces:
java.io.Serializable, java.lang.Cloneable, CapabilitiesHandler, OptionHandler, Randomizable, RevisionHandler, TechnicalInformationHandler

public class MetaCost
extends RandomizableSingleClassifierEnhancer
implements TechnicalInformationHandler

This metaclassifier makes its base classifier cost-sensitive using the method specified in

Pedro Domingos: MetaCost: A general method for making classifiers cost-sensitive. In: Fifth International Conference on Knowledge Discovery and Data Mining, 155-164, 1999.

This classifier should produce similar results to one created by passing the base learner to Bagging, which is in turn passed to a CostSensitiveClassifier operating on minimum expected cost. The difference is that MetaCost produces a single cost-sensitive classifier of the base learner, giving the benefits of fast classification and interpretable output (if the base learner itself is interpretable). This implementation uses all bagging iterations when reclassifying training data (the MetaCost paper reports a marginal improvement when only those iterations containing each training instance are used in reclassifying that instance).

BibTeX:

 @inproceedings{Domingos1999,
    author = {Pedro Domingos},
    booktitle = {Fifth International Conference on Knowledge Discovery and Data Mining},
    pages = {155-164},
    title = {MetaCost: A general method for making classifiers cost-sensitive},
    year = {1999}
 }
 

Valid options are:

 -I <num>
  Number of bagging iterations.
  (default 10)
 -C <cost file name>
  File name of a cost matrix to use. If this is not supplied,
  a cost matrix will be loaded on demand. The name of the
  on-demand file is the relation name of the training data
  plus ".cost", and the path to the on-demand file is
  specified with the -N option.
 -N <directory>
  Name of a directory to search for cost files when loading
  costs on demand (default current directory).
 -cost-matrix <matrix>
  The cost matrix in Matlab single line format.
 -P
  Size of each bag, as a percentage of the
  training set size. (default 100)
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.rules.ZeroR)
 
 Options specific to classifier weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
Options after -- are passed to the designated classifier.

Version:
$Revision: 1.24 $
Author:
Len Trigg ([email protected])
See Also:
Serialized Form

Field Summary
static int MATRIX_ON_DEMAND
          load cost matrix on demand
static int MATRIX_SUPPLIED
          use explicit matrix
static Tag[] TAGS_MATRIX_SOURCE
          Specify possible sources of the cost matrix
 
Constructor Summary
MetaCost()
           
 
Method Summary
 java.lang.String bagSizePercentTipText()
          Returns the tip text for this property
 void buildClassifier(Instances data)
          Builds the model of the base learner.
 java.lang.String costMatrixSourceTipText()
          Returns the tip text for this property
 java.lang.String costMatrixTipText()
          Returns the tip text for this property
 double[] distributionForInstance(Instance instance)
          Classifies a given instance after filtering.
 int getBagSizePercent()
          Gets the size of each bag, as a percentage of the training set size.
 Capabilities getCapabilities()
          Returns default capabilities of the classifier.
 CostMatrix getCostMatrix()
          Gets the misclassification cost matrix.
 SelectedTag getCostMatrixSource()
          Gets the source location method of the cost matrix.
 int getNumIterations()
          Gets the number of bagging iterations
 java.io.File getOnDemandDirectory()
          Returns the directory that will be searched for cost files when loading on demand.
 java.lang.String[] getOptions()
          Gets the current settings of the Classifier.
 java.lang.String getRevision()
          Returns the revision string.
 TechnicalInformation getTechnicalInformation()
          Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.
 java.lang.String globalInfo()
          Returns a string describing classifier
 java.util.Enumeration listOptions()
          Returns an enumeration describing the available options.
static void main(java.lang.String[] argv)
          Main method for testing this class.
 java.lang.String numIterationsTipText()
          Returns the tip text for this property
 java.lang.String onDemandDirectoryTipText()
          Returns the tip text for this property
 void setBagSizePercent(int newBagSizePercent)
          Sets the size of each bag, as a percentage of the training set size.
 void setCostMatrix(CostMatrix newCostMatrix)
          Sets the misclassification cost matrix.
 void setCostMatrixSource(SelectedTag newMethod)
          Sets the source location of the cost matrix.
 void setNumIterations(int numIterations)
          Sets the number of bagging iterations
 void setOnDemandDirectory(java.io.File newDir)
          Sets the directory that will be searched for cost files when loading on demand.
 void setOptions(java.lang.String[] options)
          Parses a given list of options.
 java.lang.String toString()
          Output a representation of this classifier
 
Methods inherited from class weka.classifiers.RandomizableSingleClassifierEnhancer
getSeed, seedTipText, setSeed
 
Methods inherited from class weka.classifiers.SingleClassifierEnhancer
classifierTipText, getClassifier, setClassifier
 
Methods inherited from class weka.classifiers.Classifier
classifyInstance, debugTipText, forName, getDebug, makeCopies, makeCopy, setDebug
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Field Detail

MATRIX_ON_DEMAND

public static final int MATRIX_ON_DEMAND
load cost matrix on demand

See Also:
Constant Field Values

MATRIX_SUPPLIED

public static final int MATRIX_SUPPLIED
use explicit matrix

See Also:
Constant Field Values

TAGS_MATRIX_SOURCE

public static final Tag[] TAGS_MATRIX_SOURCE
Specify possible sources of the cost matrix

Constructor Detail

MetaCost

public MetaCost()
Method Detail

globalInfo

public java.lang.String globalInfo()
Returns a string describing classifier

Returns:
a description suitable for displaying in the explorer/experimenter gui

getTechnicalInformation

public TechnicalInformation getTechnicalInformation()
Returns an instance of a TechnicalInformation object, containing detailed information about the technical background of this class, e.g., paper reference or book this class is based on.

Specified by:
getTechnicalInformation in interface TechnicalInformationHandler
Returns:
the technical information about this class

listOptions

public java.util.Enumeration listOptions()
Returns an enumeration describing the available options.

Specified by:
listOptions in interface OptionHandler
Overrides:
listOptions in class RandomizableSingleClassifierEnhancer
Returns:
an enumeration of all the available options.

setOptions

public void setOptions(java.lang.String[] options)
                throws java.lang.Exception
Parses a given list of options.

Valid options are:

 -I <num>
  Number of bagging iterations.
  (default 10)
 -C <cost file name>
  File name of a cost matrix to use. If this is not supplied,
  a cost matrix will be loaded on demand. The name of the
  on-demand file is the relation name of the training data
  plus ".cost", and the path to the on-demand file is
  specified with the -N option.
 -N <directory>
  Name of a directory to search for cost files when loading
  costs on demand (default current directory).
 -cost-matrix <matrix>
  The cost matrix in Matlab single line format.
 -P
  Size of each bag, as a percentage of the
  training set size. (default 100)
 -S <num>
  Random number seed.
  (default 1)
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
 -W
  Full name of base classifier.
  (default: weka.classifiers.rules.ZeroR)
 
 Options specific to classifier weka.classifiers.rules.ZeroR:
 
 -D
  If set, classifier is run in debug mode and
  may output additional info to the console
Options after -- are passed to the designated classifier.

Specified by:
setOptions in interface OptionHandler
Overrides:
setOptions in class RandomizableSingleClassifierEnhancer
Parameters:
options - the list of options as an array of strings
Throws:
java.lang.Exception - if an option is not supported

getOptions

public java.lang.String[] getOptions()
Gets the current settings of the Classifier.

Specified by:
getOptions in interface OptionHandler
Overrides:
getOptions in class RandomizableSingleClassifierEnhancer
Returns:
an array of strings suitable for passing to setOptions

costMatrixSourceTipText

public java.lang.String costMatrixSourceTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getCostMatrixSource

public SelectedTag getCostMatrixSource()
Gets the source location method of the cost matrix. Will be one of MATRIX_ON_DEMAND or MATRIX_SUPPLIED.

Returns:
the cost matrix source.

setCostMatrixSource

public void setCostMatrixSource(SelectedTag newMethod)
Sets the source location of the cost matrix. Values other than MATRIX_ON_DEMAND or MATRIX_SUPPLIED will be ignored.

Parameters:
newMethod - the cost matrix location method.

onDemandDirectoryTipText

public java.lang.String onDemandDirectoryTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getOnDemandDirectory

public java.io.File getOnDemandDirectory()
Returns the directory that will be searched for cost files when loading on demand.

Returns:
The cost file search directory.

setOnDemandDirectory

public void setOnDemandDirectory(java.io.File newDir)
Sets the directory that will be searched for cost files when loading on demand.

Parameters:
newDir - The cost file search directory.

bagSizePercentTipText

public java.lang.String bagSizePercentTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getBagSizePercent

public int getBagSizePercent()
Gets the size of each bag, as a percentage of the training set size.

Returns:
the bag size, as a percentage.

setBagSizePercent

public void setBagSizePercent(int newBagSizePercent)
Sets the size of each bag, as a percentage of the training set size.

Parameters:
newBagSizePercent - the bag size, as a percentage.

numIterationsTipText

public java.lang.String numIterationsTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

setNumIterations

public void setNumIterations(int numIterations)
Sets the number of bagging iterations

Parameters:
numIterations - the number of iterations to use

getNumIterations

public int getNumIterations()
Gets the number of bagging iterations

Returns:
the maximum number of bagging iterations

costMatrixTipText

public java.lang.String costMatrixTipText()
Returns the tip text for this property

Returns:
tip text for this property suitable for displaying in the explorer/experimenter gui

getCostMatrix

public CostMatrix getCostMatrix()
Gets the misclassification cost matrix.

Returns:
the cost matrix

setCostMatrix

public void setCostMatrix(CostMatrix newCostMatrix)
Sets the misclassification cost matrix.

Parameters:
newCostMatrix - the cost matrix

getCapabilities

public Capabilities getCapabilities()
Returns default capabilities of the classifier.

Specified by:
getCapabilities in interface CapabilitiesHandler
Overrides:
getCapabilities in class SingleClassifierEnhancer
Returns:
the capabilities of this classifier
See Also:
Capabilities

buildClassifier

public void buildClassifier(Instances data)
                     throws java.lang.Exception
Builds the model of the base learner.

Specified by:
buildClassifier in class Classifier
Parameters:
data - the training data
Throws:
java.lang.Exception - if the classifier could not be built successfully

distributionForInstance

public double[] distributionForInstance(Instance instance)
                                 throws java.lang.Exception
Classifies a given instance after filtering.

Overrides:
distributionForInstance in class Classifier
Parameters:
instance - the instance to be classified
Returns:
the class distribution for the given instance
Throws:
java.lang.Exception - if instance could not be classified successfully

toString

public java.lang.String toString()
Output a representation of this classifier

Overrides:
toString in class java.lang.Object
Returns:
a string representaiton of the classifier

getRevision

public java.lang.String getRevision()
Returns the revision string.

Specified by:
getRevision in interface RevisionHandler
Overrides:
getRevision in class Classifier
Returns:
the revision

main

public static void main(java.lang.String[] argv)
Main method for testing this class.

Parameters:
argv - should contain the following arguments: -t training file [-T test file] [-c class index]