Package

com.eharmony.aloha.dataset.vw

multilabel

Permalink

package multilabel

Visibility
  1. Public
  2. All

Type Members

  1. final case class VwDownsampledMultilabelRowCreator[-A, K](allLabelsInTrainingSet: IndexedSeq[K], featuresFunction: FeatureExtractorFunction[A, Sparse], defaultNamespace: List[Int], namespaces: List[(String, List[Int])], normalizer: Option[(CharSequence) ⇒ CharSequence], positiveLabelsFunction: GenAggFunc[A, IndexedSeq[K]], classNs: Char, dummyClassNs: Char, numDownsampledNegLabels: Int, seedCreator: () ⇒ Long, includeZeroValues: Boolean = false) extends StatefulRowCreator[A, Array[String], Long] with Logging with Product with Serializable

    Permalink

    Creates training data for multilabel models in Vowpal Wabbit's CSOAA LDF and WAP LDF format for the JNI.

    Creates training data for multilabel models in Vowpal Wabbit's CSOAA LDF and WAP LDF format for the JNI. In this row creator, negative labels are downsampled and costs for the downsampled labels are adjusted to produced an unbiased estimator. It is assumed that negative labels are in the majority. Downsampling negatives can improve both training time and possibly model performance. See the following resources for intuition:

    This row creator, since it is stateful, requires the caller to maintain state. If however, it is only called via an iterator or sequence, then this row creator can maintain the state during iteration over the iterator or sequence. In the case of iterators, the mapping is non-strict and in the case of sequences (Seq), it is strict.

    A

    the input type

    K

    the label or class type

    allLabelsInTrainingSet

    all labels in the training set. This is a sequence because order matters. Order here can be chosen arbitrarily, but it must be consistent in the training and test formulation.

    featuresFunction

    features to extract from the data of type A.

    defaultNamespace

    list of feature indices in the default VW namespace.

    namespaces

    a mapping from VW namespace name to feature indices in that namespace.

    normalizer

    can modify VW output (currently unused)

    positiveLabelsFunction

    A method that can extract positive class labels.

    classNs

    the namespace name for class information.

    dummyClassNs

    the namespace name for dummy class information. 2 dummy classes are added to make the predicted probabilities work.

    numDownsampledNegLabels

    a positive value representing the number of negative labels to include in each row. If this is less than the number of negative examples for a given row, then no downsampling of negatives will take place.

    seedCreator

    a "function" that creates a seed that will be used for randomness. The implementation of this function is important. It should create a unique value for each unit of parallelism. If for example, row creation is parallelized across multiple threads on one machine, the unit of parallelism is threads and seedCreator should produce unique values for each thread. If row creation is parallelized across multiple machines, the seedCreator should produce a unique value for each machine. If row creation is parallelized across machines and threads on each machine, the seedCreator should create unique values for each thread on each machine. Otherwise, randomness will be striped which is bad.

    includeZeroValues

    include zero values in VW input?

    Since

    11/6/2017

  2. final case class VwMultilabelRowCreator[-A, K](allLabelsInTrainingSet: IndexedSeq[K], featuresFunction: FeatureExtractorFunction[A, Sparse], defaultNamespace: List[Int], namespaces: List[(String, List[Int])], normalizer: Option[(CharSequence) ⇒ CharSequence], positiveLabelsFunction: GenAggFunc[A, IndexedSeq[K]], classNs: Char, dummyClassNs: Char, includeZeroValues: Boolean = false) extends RowCreator[A, Array[String]] with Product with Serializable

    Permalink

    Creates training data for multilabel models in Vowpal Wabbit's CSOAA LDF and WAP LDF format for the JNI.

    Creates training data for multilabel models in Vowpal Wabbit's CSOAA LDF and WAP LDF format for the JNI.

    A

    the input type

    K

    the label or class type

    allLabelsInTrainingSet

    all labels in the training set. This is a sequence because order matters. Order here can be chosen arbitrarily, but it must be consistent in the training and test formulation.

    featuresFunction

    features to extract from the data of type A.

    defaultNamespace

    list of feature indices in the default VW namespace.

    namespaces

    a mapping from VW namespace name to feature indices in that namespace.

    normalizer

    can modify VW output (currently unused)

    positiveLabelsFunction

    A method that can extract positive class labels.

    classNs

    the namespace name for class information.

    dummyClassNs

    the namespace name for dummy class information. 2 dummy classes are added to make the predicted probabilities work.

    includeZeroValues

    include zero values in VW input?

    Since

    9/13/2017

Value Members

  1. object VwDownsampledMultilabelRowCreator extends Rand with Serializable

    Permalink
  2. object VwMultilabelRowCreator extends Rand with Serializable

    Permalink
  3. package json

    Permalink

Ungrouped