Constructs a single decision tree from the given dataset sample
Returns the argmax for this datum
Returns the argmax for this datum
Computes the contingency tables for all given features and dataset partition For each feature and possible threshold (hence the double array), we store a distribution of datum labels that are <= than the threshold (_1 in the tuple), or larger than the threshold (_2 in the tuple) This method does not consider 0 values! See updateContingencyTables for that.
Computes the value thresholds for all features in this dataset
Computes the value thresholds for all features in this dataset
The dataset
An array of thresholds (Double) for each feature in the dataset; feature indices are used for indexing
Computes the utility of the given feature
Computes the utility of the given feature using information gain
Computes IG for a given feature and threshold
Constructs a job from the datums containing values of this feature smaller or equal than the threshold
Constructs a job from the datums containing values of this feature larger than the threshold
Computes binCount-1 quantile values, such that the sequence of values is split into binCount bins
Randomly picks selectedFeats features between 0 ..
Randomly picks selectedFeats features between 0 .. numFeats
Saves to writer.
Saves to writer. Does NOT close the writer
Saves the current model to a file
Saves the current model to a file
Returns the scores of all possible labels for this datum Convention: if the classifier can return probabilities, these must be probabilities
Returns the scores of all possible labels for this datum Convention: if the classifier can return probabilities, these must be probabilities
Trains a classifier using a CounterDataset (better to compute feature utility)
Trains a classifier, using only the datums specified in indices indices is useful for bagging
Trains a classifier, using only the datums specified in indices indices is useful for bagging
Trains the classifier on the given dataset spans is useful during cross validation
Trains the classifier on the given dataset spans is useful during cross validation
An in-house implementation of random forests User: mihais Date: 11/23/15 Last Modified: Update for Scala 2.12: fork join changes.