org.allenai.nlpstack.parse.poly

decisiontree

package decisiontree

Implements C4.5 decision trees for integral labels and attributes.

Main class to use is org.allenai.nlpstack.parse.poly.decisiontree.DecisionTree. Use the companion object to build the tree. Then use ) or ) to do prediction.

The tree takes data in the form of org.allenai.nlpstack.parse.poly.decisiontree.FeatureVectors. This is a container for a collection of org.allenai.nlpstack.parse.poly.decisiontree.FeatureVector objects.

Implementations of these are org.allenai.nlpstack.parse.poly.decisiontree.SparseVector or org.allenai.nlpstack.parse.poly.decisiontree.DenseVector.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. decisiontree
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Type Members

  1. case class DTDecision(feature: Int, featureValue: Int) extends Product with Serializable

    A decision made at a decision tree node.

    A decision made at a decision tree node.

    feature

    the splitting feature at the node

    featureValue

    the value assigned to the splitting feature

  2. case class DTDecisionPath(decisions: Seq[DTDecision]) extends Product with Serializable

    The sequence of decisions made in order to arrive at a decision tree node.

    The sequence of decisions made in order to arrive at a decision tree node.

    decisions

    the sequence of decisions

  3. case class DecisionTree(outcomes: Iterable[Int], child: IndexedSeq[Map[Int, Int]], splittingFeature: IndexedSeq[Option[Int]], outcomeHistograms: IndexedSeq[Map[Int, Int]]) extends ProbabilisticClassifier with Product with Serializable

    Immutable decision tree for integer-valued features and outcomes.

    Immutable decision tree for integer-valued features and outcomes.

    Each data structure is an indexed sequence of properties. The ith element of each sequence is the property of node i of the decision tree.

    outcomes

    all possible outcomes for the decision tree

    child

    stores the children of each node (as a map from feature values to node ids)

    splittingFeature

    stores the feature that each node splits on; can be None for leaf nodes

    outcomeHistograms

    for each node, stores a map of outcomes to their frequency of appearance at that node (i.e. how many times a training vector with that outcome makes it to this node during classification)

  4. case class DecisionTreeJustification(decisionTree: DecisionTree, node: Int) extends Justification with Product with Serializable

    Structure to represent a decision tree's justification for a certain classification outcome.

    Structure to represent a decision tree's justification for a certain classification outcome. Contains index of the chosen node and the breadcrumb that led to it: (feature index, feature value) tuple at each decision point.

  5. class DecisionTreeTrainer extends ProbabilisticClassifierTrainer

    A DecisionTreeTrainer trains decision trees from data.

  6. case class DenseVector(outcome: Option[Int], features: IndexedSeq[Int]) extends FeatureVector with Product with Serializable

    A DenseVector is a feature vector with arbitrary integral features.

    A DenseVector is a feature vector with arbitrary integral features.

    outcome

    the outcome of the feature vector

    features

    the value of each feature

  7. case class EntropyGainMetric(minimumGain: Float) extends InformationGainMetric with Product with Serializable

  8. sealed trait FeatureVector extends AnyRef

    A feature vector with integral features and outcome.

  9. trait FeatureVectorSource extends AnyRef

  10. case class InMemoryFeatureVectorSource(featureVecs: IndexedSeq[FeatureVector], classificationTask: ClassificationTask) extends FeatureVectorSource with Product with Serializable

    FeatureVectors is a convenience container for feature vectors.

    FeatureVectors is a convenience container for feature vectors.

    The number of features must be the same for all feature vectors in the container.

    featureVecs

    collection of FeatureVector objects

  11. sealed trait InformationGainMetric extends AnyRef

  12. trait Justification extends AnyRef

  13. case class MultinomialGainMetric(minimumGain: Float) extends InformationGainMetric with Product with Serializable

  14. class OmnibusTrainer extends ProbabilisticClassifierTrainer

  15. case class OneVersusAll(binaryClassifiers: Seq[(Int, ProbabilisticClassifier)]) extends ProbabilisticClassifier with Product with Serializable

    The OneVersusAll implements multi-outcome classification as a set of binary classifiers.

    The OneVersusAll implements multi-outcome classification as a set of binary classifiers.

    A ProbabilisticClassifier is associated with each outcome. Suppose there are three outcomes: 0, 1, 2. Then the constructor would take a sequence of three classifiers as its argument: [(0,A), (1,B), (2,C)]. To compute the outcome distribution for a new feature vector v, the OneVersusAll would normalize:

    [ A.outcomeDistribution(v)(1), B.outcomeDistribution(v)(1), C.outcomeDistribution(v)(1) ]

    i.e. the probability of 1 (true) according to binary classifiers A, B, and C.

    QUESTION(MH): is this the best way to normalize these, or would it be better to normalize by summing the logs and then re-applying the exponential operation?

    binaryClassifiers

    the binary classifier associated with each outcome

  16. class OneVersusAllTrainer extends ProbabilisticClassifierTrainer

    A OneVersusAllTrainer trains a OneVersusAll using a base ProbabilisticClassifierTrainer to train one binary classifier per outcome.

  17. case class OutcomeDistribution(dist: Map[Int, Float]) extends Product with Serializable

  18. trait ProbabilisticClassifier extends AnyRef

  19. trait ProbabilisticClassifierTrainer extends (FeatureVectorSource) ⇒ ProbabilisticClassifier

  20. case class RandomForest(allOutcomes: Seq[Int], decisionTrees: Seq[DecisionTree]) extends ProbabilisticClassifier with Product with Serializable

    A RandomForest is a collection of decision trees.

    A RandomForest is a collection of decision trees. Each decision tree gets a single vote about the outcome. The outcome distribution is the normalized histogram of the votes.

    allOutcomes

    the collection of possible outcomes

    decisionTrees

    the collection of decision trees

  21. case class RandomForestJustification(randomForest: RandomForest, tree: Int, treeNode: Int) extends Justification with Product with Serializable

  22. class RandomForestTrainer extends ProbabilisticClassifierTrainer

    A RandomForestTrainer trains a RandomForest from a set of feature vectors.

  23. case class RemappedFeatureVectorSource(fvSource: FeatureVectorSource, outcomeRemapping: (Int) ⇒ Int) extends FeatureVectorSource with Product with Serializable

  24. case class SparseVector(outcome: Option[Int], numFeatures: Int, trueFeatures: Set[Int]) extends FeatureVector with Product with Serializable

    A SparseVector is a feature vector with sparse binary features.

    A SparseVector is a feature vector with sparse binary features.

    outcome

    the outcome of the feature vector

    numFeatures

    the number of features

    trueFeatures

    the set of features with value 1

Value Members

  1. object DecisionTree extends Serializable

  2. object OutcomeDistribution extends Serializable

  3. object ProbabilisticClassifier

  4. object RandomForest extends Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped