Evaluation

Overview

Package

Class

Tree

Deprecated

Index

Help

Weka's home

PREV CLASS NEXT CLASS

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

weka.classifiers
Class Evaluation

java.lang.Object
  weka.classifiers.Evaluation

All Implemented Interfaces:: java.io.Serializable, RevisionHandler, Summarizable

Direct Known Subclasses:: AggregateableEvaluation

public class Evaluation
extends java.lang.Object
implements Summarizable, RevisionHandler, java.io.Serializable
extends java.lang.Object
implements Summarizable, RevisionHandler, java.io.Serializable

Class for evaluating machine learning models.

-------------------------------------------------------------------

General options when evaluating a learning scheme from the command-line:

-t filename
Name of the file with the training data. (required)

-T filename
Name of the file with the test data. If missing a cross-validation is performed.

-c index
Index of the class attribute (1, 2, ...; default: last).

-x number
The number of folds for the cross-validation (default: 10).

-no-cv
No cross validation. If no test file is provided, no evaluation is done.

-split-percentage percentage
Sets the percentage for the train/test set split, e.g., 66.

-preserve-order
Preserves the order in the percentage split instead of randomizing the data first with the seed value ('-s').

-s seed
Random number seed for the cross-validation and percentage split (default: 1).

-m filename
The name of a file containing a cost matrix.

-l filename
Loads classifier from the given file. In case the filename ends with ".xml", a PMML file is loaded or, if that fails, options are loaded from XML.

-d filename
Saves classifier built from the training data into the given file. In case the filename ends with ".xml" the options are saved XML, not the model.

-v
Outputs no statistics for the training data.

-o
Outputs statistics only, not the classifier.

-i
Outputs information-retrieval statistics per class.

-k
Outputs information-theoretic statistics.

-classifications "weka.classifiers.evaluation.output.prediction.AbstractOutput + options"
Uses the specified class for generating the classification output. E.g.: weka.classifiers.evaluation.output.prediction.PlainText or : weka.classifiers.evaluation.output.prediction.CSV -p range
Outputs predictions for test instances (or the train instances if no test instances provided and -no-cv is used), along with the attributes in the specified range (and nothing else). Use '-p 0' if no attributes are desired.

Deprecated: use "-classifications ..." instead.

-distribution
Outputs the distribution instead of only the prediction in conjunction with the '-p' option (only nominal classes).

Deprecated: use "-classifications ..." instead.

-no-predictions
Turns off the collection of predictions in order to conserve memory.

-r
Outputs cumulative margin distribution (and nothing else).

-g
Only for classifiers that implement "Graphable." Outputs the graph representation of the classifier (and nothing else).

-xml filename | xml-string
Retrieves the options from the XML-data instead of the command line.

-threshold-file file
The file to save the threshold data to. The format is determined by the extensions, e.g., '.arff' for ARFF format or '.csv' for CSV.

-threshold-label label
The class label to determine the threshold data for (default is the first label)

-------------------------------------------------------------------

Example usage as the main of a classifier (called FunkyClassifier):

 public static void main(String [] args) {
   runClassifier(new FunkyClassifier(), args);
 }

------------------------------------------------------------------

Example usage from within an application:

 Instances trainInstances = ... instances got from somewhere
 Instances testInstances = ... instances got from somewhere
 Classifier scheme = ... scheme got from somewhere

 Evaluation evaluation = new Evaluation(trainInstances);
 evaluation.evaluateModel(scheme, testInstances);
 System.out.println(evaluation.toSummaryString());

Version:: $Revision: 7579 $
Author:: Eibe Frank ([email protected]), Len Trigg ([email protected])
See Also:: Serialized Form

Constructor Summary
`Evaluation(Instances data)` Initializes all the counters for the evaluation.
`Evaluation(Instances data, CostMatrix costMatrix)` Initializes all the counters for the evaluation and also takes a cost matrix as parameter.

Method Summary
`double`	`areaUnderROC(int classIndex)` Returns the area under ROC for those predictions that have been collected in the evaluateClassifier(Classifier, Instances) method.
`double`	`avgCost()` Gets the average cost, that is, total cost of misclassifications (incorrect plus unclassified) over the total number of instances.
`double[][]`	`confusionMatrix()` Returns a copy of the confusion matrix.
`double`	`correct()` Gets the number of instances correctly classified (that is, for which a correct prediction was made).
`double`	`correlationCoefficient()` Returns the correlation coefficient if the class is numeric.
`double`	`coverageOfTestCasesByPredictedRegions()` Gets the coverage of the test cases by the predicted regions at the confidence level specified when evaluation was performed.
`void`	`crossValidateModel(Classifier classifier, Instances data, int numFolds, java.util.Random random, java.lang.Object... forPredictionsPrinting)` Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.
`void`	`crossValidateModel(java.lang.String classifierString, Instances data, int numFolds, java.lang.String[] options, java.util.Random random)` Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.
`boolean`	`equals(java.lang.Object obj)` Tests whether the current evaluation object is equal to another evaluation object.
`double`	`errorRate()` Returns the estimated error rate or the root mean squared error (if the class is numeric).
`double[]`	`evaluateModel(Classifier classifier, Instances data, java.lang.Object... forPredictionsPrinting)` Evaluates the classifier on a given set of instances.
`static java.lang.String`	`evaluateModel(Classifier classifier, java.lang.String[] options)` Evaluates a classifier with the options given in an array of strings.
`static java.lang.String`	`evaluateModel(java.lang.String classifierString, java.lang.String[] options)` Evaluates a classifier with the options given in an array of strings.
`double`	`evaluateModelOnce(Classifier classifier, Instance instance)` Evaluates the classifier on a single instance.
`double`	`evaluateModelOnce(double[] dist, Instance instance)` Evaluates the supplied distribution on a single instance.
`void`	`evaluateModelOnce(double prediction, Instance instance)` Evaluates the supplied prediction on a single instance.
`double`	`evaluateModelOnceAndRecordPrediction(Classifier classifier, Instance instance)` Evaluates the classifier on a single instance and records the prediction.
`double`	`evaluateModelOnceAndRecordPrediction(double[] dist, Instance instance)` Evaluates the supplied distribution on a single instance.
`double`	`evaluationForSingleInstance(double[] dist, Instance instance, boolean storePredictions)` Evaluates the supplied distribution on a single instance.
`double`	`falseNegativeRate(int classIndex)` Calculate the false negative rate with respect to a particular class.
`double`	`falsePositiveRate(int classIndex)` Calculate the false positive rate with respect to a particular class.
`double`	`fMeasure(int classIndex)` Calculate the F-Measure with respect to a particular class.
`double[]`	`getClassPriors()` Get the current weighted class counts.
`boolean`	`getDiscardPredictions()` Returns whether predictions are not recorded at all, in order to conserve memory.
`Instances`	`getHeader()` Returns the header of the underlying dataset.
`java.lang.String`	`getRevision()` Returns the revision string.
`double`	`incorrect()` Gets the number of instances incorrectly classified (that is, for which an incorrect prediction was made).
`double`	`kappa()` Returns value of kappa statistic if class is nominal.
`double`	`KBInformation()` Return the total Kononenko & Bratko Information score in bits.
`double`	`KBMeanInformation()` Return the Kononenko & Bratko Information score in bits per instance.
`double`	`KBRelativeInformation()` Return the Kononenko & Bratko Relative Information score.
`static void`	`main(java.lang.String[] args)` A test method for this class.
`double`	`meanAbsoluteError()` Returns the mean absolute error.
`double`	`meanPriorAbsoluteError()` Returns the mean absolute error of the prior.
`double`	`numFalseNegatives(int classIndex)` Calculate number of false negatives with respect to a particular class.
`double`	`numFalsePositives(int classIndex)` Calculate number of false positives with respect to a particular class.
`double`	`numInstances()` Gets the number of test instances that had a known class value (actually the sum of the weights of test instances with known class value).
`double`	`numTrueNegatives(int classIndex)` Calculate the number of true negatives with respect to a particular class.
`double`	`numTruePositives(int classIndex)` Calculate the number of true positives with respect to a particular class.
`double`	`pctCorrect()` Gets the percentage of instances correctly classified (that is, for which a correct prediction was made).
`double`	`pctIncorrect()` Gets the percentage of instances incorrectly classified (that is, for which an incorrect prediction was made).
`double`	`pctUnclassified()` Gets the percentage of instances not classified (that is, for which no prediction was made by the classifier).
`double`	`precision(int classIndex)` Calculate the precision with respect to a particular class.
`FastVector`	`predictions()` Returns the predictions that have been collected.
`double`	`priorEntropy()` Calculate the entropy of the prior distribution.
`double`	`recall(int classIndex)` Calculate the recall with respect to a particular class.
`double`	`relativeAbsoluteError()` Returns the relative absolute error.
`double`	`rootMeanPriorSquaredError()` Returns the root mean prior squared error.
`double`	`rootMeanSquaredError()` Returns the root mean squared error.
`double`	`rootRelativeSquaredError()` Returns the root relative squared error if the class is numeric.
`void`	`setDiscardPredictions(boolean value)` Sets whether to discard predictions, ie, not storing them for future reference via predictions() method in order to conserve memory.
`void`	`setPriors(Instances train)` Sets the class prior probabilities.
`double`	`SFEntropyGain()` Returns the total SF, which is the null model entropy minus the scheme entropy.
`double`	`SFMeanEntropyGain()` Returns the SF per instance, which is the null model entropy minus the scheme entropy, per instance.
`double`	`SFMeanPriorEntropy()` Returns the entropy per instance for the null model.
`double`	`SFMeanSchemeEntropy()` Returns the entropy per instance for the scheme.
`double`	`SFPriorEntropy()` Returns the total entropy for the null model.
`double`	`SFSchemeEntropy()` Returns the total entropy for the scheme.
`double`	`sizeOfPredictedRegions()` Gets the average size of the predicted regions, relative to the range of the target in the training data, at the confidence level specified when evaluation was performed.
`java.lang.String`	`toClassDetailsString()` Generates a breakdown of the accuracy for each class (with default title), incorporating various information-retrieval statistics, such as true/false positive rate, precision/recall/F-Measure.
`java.lang.String`	`toClassDetailsString(java.lang.String title)` Generates a breakdown of the accuracy for each class, incorporating various information-retrieval statistics, such as true/false positive rate, precision/recall/F-Measure.
`java.lang.String`	`toCumulativeMarginDistributionString()` Output the cumulative margin distribution as a string suitable for input for gnuplot or similar package.
`java.lang.String`	`toMatrixString()` Calls toMatrixString() with a default title.
`java.lang.String`	`toMatrixString(java.lang.String title)` Outputs the performance statistics as a classification confusion matrix.
`java.lang.String`	`toSummaryString()` Calls toSummaryString() with no title and no complexity stats.
`java.lang.String`	`toSummaryString(boolean printComplexityStatistics)` Calls toSummaryString() with a default title.
`java.lang.String`	`toSummaryString(java.lang.String title, boolean printComplexityStatistics)` Outputs the performance statistics in summary form.
`double`	`totalCost()` Gets the total cost, that is, the cost of each prediction times the weight of the instance, summed over all instances.
`double`	`trueNegativeRate(int classIndex)` Calculate the true negative rate with respect to a particular class.
`double`	`truePositiveRate(int classIndex)` Calculate the true positive rate with respect to a particular class.
`double`	`unclassified()` Gets the number of instances not classified (that is, for which no prediction was made by the classifier).
`double`	`unweightedMacroFmeasure()` Unweighted macro-averaged F-measure.
`double`	`unweightedMicroFmeasure()` Unweighted micro-averaged F-measure.
`void`	`updatePriors(Instance instance)` Updates the class prior probabilities or the mean respectively (when incrementally training).
`void`	`useNoPriors()` disables the use of priors, e.g., in case of de-serialized schemes that have no access to the original training set, but are evaluated on a set set.
`double`	`weightedAreaUnderROC()` Calculates the weighted (by class size) AUC.
`double`	`weightedFalseNegativeRate()` Calculates the weighted (by class size) false negative rate.
`double`	`weightedFalsePositiveRate()` Calculates the weighted (by class size) false positive rate.
`double`	`weightedFMeasure()` Calculates the macro weighted (by class size) average F-Measure.
`double`	`weightedPrecision()` Calculates the weighted (by class size) precision.
`double`	`weightedRecall()` Calculates the weighted (by class size) recall.
`double`	`weightedTrueNegativeRate()` Calculates the weighted (by class size) true negative rate.
`double`	`weightedTruePositiveRate()` Calculates the weighted (by class size) true positive rate.
`static java.lang.String`	`wekaStaticWrapper(Sourcable classifier, java.lang.String className)` Wraps a static classifier in enough source to test using the weka class libraries.

Methods inherited from class java.lang.Object
`getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Constructor Detail

Evaluation

public Evaluation(Instances data)
           throws java.lang.Exception

Initializes all the counters for the evaluation. Use useNoPriors() if the dataset is the test set and you can't initialize with the priors from the training set via setPriors(Instances).

Parameters:: data - set of training instances, to get some header information and prior class distribution information
Throws:: java.lang.Exception - if the class is not defined
See Also:: useNoPriors(), setPriors(Instances)

Evaluation

public Evaluation(Instances data,
                  CostMatrix costMatrix)
           throws java.lang.Exception

Initializes all the counters for the evaluation and also takes a cost matrix as parameter. Use useNoPriors() if the dataset is the test set and you can't initialize with the priors from the training set via setPriors(Instances).

Parameters:: data - set of training instances, to get some header information and prior class distribution information; costMatrix - the cost matrix---if null, default costs will be used
Throws:: java.lang.Exception - if cost matrix is not compatible with data, the class is not defined or the class is numeric
See Also:: useNoPriors(), setPriors(Instances)

Method Detail

getHeader

public Instances getHeader()

Returns the header of the underlying dataset.

Returns:: the header information

setDiscardPredictions

public void setDiscardPredictions(boolean value)

Sets whether to discard predictions, ie, not storing them for future reference via predictions() method in order to conserve memory.

Parameters:: value - true if to discard the predictions
See Also:: predictions()

getDiscardPredictions

public boolean getDiscardPredictions()

Returns whether predictions are not recorded at all, in order to conserve memory.

Returns:: true if predictions are not recorded
See Also:: predictions()

areaUnderROC

public double areaUnderROC(int classIndex)

Returns the area under ROC for those predictions that have been collected in the evaluateClassifier(Classifier, Instances) method. Returns Utils.missingValue() if the area is not available.

Parameters:: classIndex - the index of the class to consider as "positive"
Returns:: the area under the ROC curve or not a number

weightedAreaUnderROC

public double weightedAreaUnderROC()

Calculates the weighted (by class size) AUC.

Returns:: the weighted AUC.

confusionMatrix

public double[][] confusionMatrix()

Returns a copy of the confusion matrix.

Returns:: a copy of the confusion matrix as a two-dimensional array

crossValidateModel

public void crossValidateModel(Classifier classifier,
                               Instances data,
                               int numFolds,
                               java.util.Random random,
                               java.lang.Object... forPredictionsPrinting)
                        throws java.lang.Exception

Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances. Now performs a deep copy of the classifier before each call to buildClassifier() (just in case the classifier is not initialized properly).

Parameters:: classifier - the classifier with any options set.; data - the data on which the cross-validation is to be performed; numFolds - the number of folds for the cross-validation; random - random number generator for randomization; forPredictionsPrinting - varargs parameter that, if supplied, is expected to hold a weka.classifiers.evaluation.output.prediction.AbstractOutput object
Throws:: java.lang.Exception - if a classifier could not be generated successfully or the class is not defined

crossValidateModel

public void crossValidateModel(java.lang.String classifierString,
                               Instances data,
                               int numFolds,
                               java.lang.String[] options,
                               java.util.Random random)
                        throws java.lang.Exception

Performs a (stratified if class is nominal) cross-validation for a classifier on a set of instances.

Parameters:: classifierString - a string naming the class of the classifier; data - the data on which the cross-validation is to be performed; numFolds - the number of folds for the cross-validation; options - the options to the classifier. Any options; random - the random number generator for randomizing the data accepted by the classifier will be removed from this array.
Throws:: java.lang.Exception - if a classifier could not be generated successfully or the class is not defined

evaluateModel

public static java.lang.String evaluateModel(java.lang.String classifierString,
                                             java.lang.String[] options)
                                      throws java.lang.Exception

Evaluates a classifier with the options given in an array of strings.

Valid options are: