Creates training data for multilabel models in Vowpal Wabbit's CSOAA LDF and WAP LDF format for the JNI.
Creates training data for multilabel models in Vowpal Wabbit's CSOAA LDF and WAP LDF format for the JNI.
Creates training data for multilabel models in Vowpal Wabbit's CSOAA LDF and WAP LDF format for the JNI.
the input type
the label or class type
all labels in the training set. This is a sequence because order matters. Order here can be chosen arbitrarily, but it must be consistent in the training and test formulation.
features to extract from the data of type A
.
list of feature indices in the default VW namespace.
a mapping from VW namespace name to feature indices in that namespace.
can modify VW output (currently unused)
A method that can extract positive class labels.
the namespace name for class information.
the namespace name for dummy class information. 2 dummy classes are added to make the predicted probabilities work.
include zero values in VW input?
9/13/2017
Creates training data for multilabel models in Vowpal Wabbit's CSOAA LDF and WAP LDF format for the JNI. In this row creator, negative labels are downsampled and costs for the downsampled labels are adjusted to produced an unbiased estimator. It is assumed that negative labels are in the majority. Downsampling negatives can improve both training time and possibly model performance. See the following resources for intuition:
This row creator, since it is stateful, requires the caller to maintain state. If however, it is only called via an iterator or sequence, then this row creator can maintain the state during iteration over the iterator or sequence. In the case of iterators, the mapping is non-strict and in the case of sequences (
Seq
), it is strict.the input type
the label or class type
all labels in the training set. This is a sequence because order matters. Order here can be chosen arbitrarily, but it must be consistent in the training and test formulation.
features to extract from the data of type
A
.list of feature indices in the default VW namespace.
a mapping from VW namespace name to feature indices in that namespace.
can modify VW output (currently unused)
A method that can extract positive class labels.
the namespace name for class information.
the namespace name for dummy class information. 2 dummy classes are added to make the predicted probabilities work.
a positive value representing the number of negative labels to include in each row. If this is less than the number of negative examples for a given row, then no downsampling of negatives will take place.
a "function" that creates a seed that will be used for randomness. The implementation of this function is important. It should create a unique value for each unit of parallelism. If for example, row creation is parallelized across multiple threads on one machine, the unit of parallelism is threads and
seedCreator
should produce unique values for each thread. If row creation is parallelized across multiple machines, theseedCreator
should produce a unique value for each machine. If row creation is parallelized across machines and threads on each machine, theseedCreator
should create unique values for each thread on each machine. Otherwise, randomness will be striped which is bad.include zero values in VW input?
11/6/2017