decisiontree

Type Members

case class DTDecision(feature: Int, featureValue: Int) extends Product with Serializable

A decision made at a decision tree node.
A decision made at a decision tree node.
feature
the splitting feature at the node
featureValue
the value assigned to the splitting feature
case class DTDecisionPath(decisions: Seq[DTDecision]) extends Product with Serializable

The sequence of decisions made in order to arrive at a decision tree node.
The sequence of decisions made in order to arrive at a decision tree node.
decisions
the sequence of decisions
case class DecisionTree(outcomes: Iterable[Int], child: IndexedSeq[Map[Int, Int]], splittingFeature: IndexedSeq[Option[Int]], outcomeHistograms: IndexedSeq[Map[Int, Int]]) extends ProbabilisticClassifier with Product with Serializable

Immutable decision tree for integer-valued features and outcomes.
Immutable decision tree for integer-valued features and outcomes.
Each data structure is an indexed sequence of properties. The ith element of each sequence is the property of node i of the decision tree.
outcomes
all possible outcomes for the decision tree
child
stores the children of each node (as a map from feature values to node ids)
splittingFeature
stores the feature that each node splits on; can be None for leaf nodes
outcomeHistograms
for each node, stores a map of outcomes to their frequency of appearance at that node (i.e. how many times a training vector with that outcome makes it to this node during classification)
case class DecisionTreeJustification(decisionTree: DecisionTree, node: Int) extends Justification with Product with Serializable

Structure to represent a decision tree's justification for a certain classification outcome.
Structure to represent a decision tree's justification for a certain classification outcome. Contains index of the chosen node and the breadcrumb that led to it: (feature index, feature value) tuple at each decision point.
class DecisionTreeTrainer extends ProbabilisticClassifierTrainer

A DecisionTreeTrainer trains decision trees from data.
case class DenseVector(outcome: Option[Int], features: IndexedSeq[Int]) extends FeatureVector with Product with Serializable

A DenseVector is a feature vector with arbitrary integral features.
A DenseVector is a feature vector with arbitrary integral features.
outcome
the outcome of the feature vector
features
the value of each feature
case class EntropyGainMetric(minimumGain: Float) extends InformationGainMetric with Product with Serializable
sealed trait FeatureVector extends AnyRef

A feature vector with integral features and outcome.
trait FeatureVectorSource extends AnyRef
case class InMemoryFeatureVectorSource(featureVecs: IndexedSeq[FeatureVector], classificationTask: ClassificationTask) extends FeatureVectorSource with Product with Serializable

FeatureVectors is a convenience container for feature vectors.
FeatureVectors is a convenience container for feature vectors.
The number of features must be the same for all feature vectors in the container.
featureVecs
collection of FeatureVector objects
sealed trait InformationGainMetric extends AnyRef
trait Justification extends AnyRef
case class MultinomialGainMetric(minimumGain: Float) extends InformationGainMetric with Product with Serializable
class OmnibusTrainer extends ProbabilisticClassifierTrainer
case class OneVersusAll(binaryClassifiers: Seq[(Int, ProbabilisticClassifier)]) extends ProbabilisticClassifier with Product with Serializable

The OneVersusAll implements multi-outcome classification as a set of binary classifiers.
The OneVersusAll implements multi-outcome classification as a set of binary classifiers.
A ProbabilisticClassifier is associated with each outcome. Suppose there are three outcomes: 0, 1, 2. Then the constructor would take a sequence of three classifiers as its argument: [(0,A), (1,B), (2,C)]. To compute the outcome distribution for a new feature vector v, the OneVersusAll would normalize:
[ A.outcomeDistribution(v)(1), B.outcomeDistribution(v)(1), C.outcomeDistribution(v)(1) ]
i.e. the probability of 1 (true) according to binary classifiers A, B, and C.
QUESTION(MH): is this the best way to normalize these, or would it be better to normalize by summing the logs and then re-applying the exponential operation?
binaryClassifiers
the binary classifier associated with each outcome
class OneVersusAllTrainer extends ProbabilisticClassifierTrainer

A OneVersusAllTrainer trains a OneVersusAll using a base ProbabilisticClassifierTrainer to train one binary classifier per outcome.
case class OutcomeDistribution(dist: Map[Int, Float]) extends Product with Serializable
trait ProbabilisticClassifier extends AnyRef
trait ProbabilisticClassifierTrainer extends (FeatureVectorSource) ⇒ ProbabilisticClassifier
case class RandomForest(allOutcomes: Seq[Int], decisionTrees: Seq[DecisionTree]) extends ProbabilisticClassifier with Product with Serializable

A RandomForest is a collection of decision trees.
A RandomForest is a collection of decision trees. Each decision tree gets a single vote about the outcome. The outcome distribution is the normalized histogram of the votes.
allOutcomes
the collection of possible outcomes
decisionTrees
the collection of decision trees
case class RandomForestJustification(randomForest: RandomForest, tree: Int, treeNode: Int) extends Justification with Product with Serializable
class RandomForestTrainer extends ProbabilisticClassifierTrainer

A RandomForestTrainer trains a RandomForest from a set of feature vectors.
case class RemappedFeatureVectorSource(fvSource: FeatureVectorSource, outcomeRemapping: (Int) ⇒ Int) extends FeatureVectorSource with Product with Serializable
case class SparseVector(outcome: Option[Int], numFeatures: Int, trueFeatures: Set[Int]) extends FeatureVector with Product with Serializable

A SparseVector is a feature vector with sparse binary features.
A SparseVector is a feature vector with sparse binary features.
outcome
the outcome of the feature vector
numFeatures
the number of features
trueFeatures
the set of features with value 1

package decisiontree

Type Members

case class DTDecision(feature: Int, featureValue: Int) extends Product with Serializable

case class DTDecisionPath(decisions: Seq[DTDecision]) extends Product with Serializable

case class DecisionTree(outcomes: Iterable[Int], child: IndexedSeq[Map[Int, Int]], splittingFeature: IndexedSeq[Option[Int]], outcomeHistograms: IndexedSeq[Map[Int, Int]]) extends ProbabilisticClassifier with Product with Serializable

case class DecisionTreeJustification(decisionTree: DecisionTree, node: Int) extends Justification with Product with Serializable

class DecisionTreeTrainer extends ProbabilisticClassifierTrainer

case class DenseVector(outcome: Option[Int], features: IndexedSeq[Int]) extends FeatureVector with Product with Serializable

case class EntropyGainMetric(minimumGain: Float) extends InformationGainMetric with Product with Serializable

sealed trait FeatureVector extends AnyRef

trait FeatureVectorSource extends AnyRef

case class InMemoryFeatureVectorSource(featureVecs: IndexedSeq[FeatureVector], classificationTask: ClassificationTask) extends FeatureVectorSource with Product with Serializable

sealed trait InformationGainMetric extends AnyRef

trait Justification extends AnyRef

case class MultinomialGainMetric(minimumGain: Float) extends InformationGainMetric with Product with Serializable

class OmnibusTrainer extends ProbabilisticClassifierTrainer

case class OneVersusAll(binaryClassifiers: Seq[(Int, ProbabilisticClassifier)]) extends ProbabilisticClassifier with Product with Serializable

class OneVersusAllTrainer extends ProbabilisticClassifierTrainer

case class OutcomeDistribution(dist: Map[Int, Float]) extends Product with Serializable

trait ProbabilisticClassifier extends AnyRef

trait ProbabilisticClassifierTrainer extends (FeatureVectorSource) ⇒ ProbabilisticClassifier

case class RandomForest(allOutcomes: Seq[Int], decisionTrees: Seq[DecisionTree]) extends ProbabilisticClassifier with Product with Serializable

case class RandomForestJustification(randomForest: RandomForest, tree: Int, treeNode: Int) extends Justification with Product with Serializable

class RandomForestTrainer extends ProbabilisticClassifierTrainer

case class RemappedFeatureVectorSource(fvSource: FeatureVectorSource, outcomeRemapping: (Int) ⇒ Int) extends FeatureVectorSource with Product with Serializable

case class SparseVector(outcome: Option[Int], numFeatures: Int, trueFeatures: Set[Int]) extends FeatureVector with Product with Serializable

Value Members

object DecisionTree extends Serializable

object OutcomeDistribution extends Serializable

object ProbabilisticClassifier

object RandomForest extends Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped