Package

com.salesforce.op.stages.impl

preparators

Permalink

package preparators

Visibility
  1. Public
  2. All

Type Members

  1. case class CategoricalGroupStats(group: String, categoricalFeatures: Array[String], contingencyMatrix: Type, pointwiseMutualInfo: Type, cramersV: Double, mutualInfo: Double, maxRuleConfidences: Array[Double], supports: Array[Double]) extends MetadataLike with Product with Serializable

    Permalink

    Container for categorical stats coming from a single group (and therefore a single contingency matrix)

    Container for categorical stats coming from a single group (and therefore a single contingency matrix)

    group

    Indicator group for this contingency matrix

    categoricalFeatures

    Array of categorical features belonging to this group

    contingencyMatrix

    Contingency matrix for this feature group

    pointwiseMutualInfo

    Matrix of PMI values in Map form (label -> PMI values)

    cramersV

    Cramer's V value for this feature group (how strongly correlated is it with the label)

    mutualInfo

    Mutual info value for this feature group

    maxRuleConfidences

    Array (one value per contingency matrix row) containing the largest association rule confidence for that row (over all the labels)

    supports

    Array (one value per contingency matrix row) containing the supports for each categorical choice (fraction of dats in which it is chosen)

  2. sealed trait CorrelationExclusion extends EnumEntry with Serializable

    Permalink

    Categories of feature vector columns to exclude from the feature-label correlation matrix (or just array of feature-label correlations) calculated inSanityChecker.

  3. sealed trait CorrelationLevel extends EnumEntry with Serializable

    Permalink

    Settings for feature - feature correlations

  4. sealed abstract class CorrelationType extends EnumEntry with Serializable

    Permalink

    Represents a kind of correlation coefficient.

  5. case class Correlations(featuresIn: Seq[String], valuesWithLabel: Seq[Double], valuesWithFeatures: Seq[Seq[Double]], corrType: CorrelationType) extends MetadataLike with Product with Serializable

    Permalink

    Correlations between features and the label from SanityChecker

    Correlations between features and the label from SanityChecker

    featuresIn

    names of features

    valuesWithLabel

    correlation of feature with label

    valuesWithFeatures

    correlations between features

    corrType

    type of correlation done on

  6. trait DerivedFeatureFilterNames extends AnyRef

    Permalink
  7. trait DerivedFeatureFilterParams extends Params

    Permalink
  8. trait DerivedFeatureFilterSummary extends DerivedFeatureFilterNames

    Permalink
  9. class MinVarianceFilter extends UnaryEstimator[OPVector, OPVector] with DerivedFeatureFilterParams

    Permalink

    The MinVarianceFilter checks that computed features have a minimum variance

    The MinVarianceFilter checks that computed features have a minimum variance

    Like SanityChecker, the Estimator step outputs statistics on incoming data, as well as the names of features which should be dropped from the feature vector. And the transformer step applies the action of actually removing the low variance features from the feature vector

    Two distinctions from SanityChecker: (1) no label column as input; and (2) only filters features by variance

  10. final class MinVarianceFilterModel extends UnaryModel[OPVector, OPVector]

    Permalink
  11. case class MinVarianceSummary(dropped: Seq[String], featuresStatistics: SummaryStatistics, names: Seq[String]) extends MetadataLike with Product with Serializable

    Permalink

    Case class to store metadata from MinVarianceFilter

    Case class to store metadata from MinVarianceFilter

    dropped

    features dropped by minimum variance filter

    featuresStatistics

    stats on features

    names

    names of features passed in

  12. class PredictionDeIndexer extends BinaryEstimator[RealNN, RealNN, Text] with SaveOthersParams

    Permalink

    Estimator which takes response feature and predinction feature as inputs.

    Estimator which takes response feature and predinction feature as inputs. It deindexes the pred by using response's metadata

    Input 1 : response Input 2 : pred feature

  13. final class PredictionDeIndexerModel extends BinaryModel[RealNN, RealNN, Text]

    Permalink
  14. class SanityChecker extends BinaryEstimator[RealNN, OPVector, OPVector] with SanityCheckerParams with AllowLabelAsInput[OPVector]

    Permalink

    The SanityChecker checks for potential problems with computed features in a supervised learning setting.

    The SanityChecker checks for potential problems with computed features in a supervised learning setting.

    There is an Estimator step, which outputs statistics on the incoming data, as well as the names of features which should be dropped from the feature vector. The transformer step applies the action of actually removing the offending features from the feature vector.

  15. final class SanityCheckerModel extends BinaryModel[RealNN, OPVector, OPVector] with AllowLabelAsInput[OPVector]

    Permalink
  16. trait SanityCheckerParams extends DerivedFeatureFilterParams

    Permalink
  17. case class SanityCheckerSummary(correlations: Correlations, dropped: Seq[String], featuresStatistics: SummaryStatistics, names: Seq[String], categoricalStats: Array[CategoricalGroupStats]) extends MetadataLike with Product with Serializable

    Permalink

    Case class to convert to and from SanityChecker summary metadata

    Case class to convert to and from SanityChecker summary metadata

    correlations

    feature correlations with label

    dropped

    features dropped for label leakage

    featuresStatistics

    stats on features

    names

    names of features passed in

  18. case class SummaryStatistics(count: Double, sampleFraction: Double, max: Seq[Double], min: Seq[Double], mean: Seq[Double], variance: Seq[Double]) extends MetadataLike with Product with Serializable

    Permalink

    Statistics on features (zip arrays with names in SanityCheckerSummary to get feature associated with values)

    Statistics on features (zip arrays with names in SanityCheckerSummary to get feature associated with values)

    count

    count of data in sample used to calculate stats

    sampleFraction

    fraction of total data used in calculation

    max

    max value seen

    min

    min value

    mean

    mean value

    variance

    variance of value

  19. case class CategoricalStats(categoricalFeatures: Array[String] = Array.empty, cramersVs: Array[Double] = Array.empty, pointwiseMutualInfos: Type = LabelWiseValues.empty, mutualInfos: Array[Double] = Array.empty, counts: Type = LabelWiseValues.empty) extends MetadataLike with Product with Serializable

    Permalink

    Container class for statistics calculated from contingency tables constructed from categorical variables

    Container class for statistics calculated from contingency tables constructed from categorical variables

    categoricalFeatures

    Names of features that we performed categorical tests on

    cramersVs

    Values of cramersV for each feature (should be the same for everything coming from the same contingency matrix)

    pointwiseMutualInfos

    Map from label value (as a string) to an Array (over features) of PMI values

    mutualInfos

    Values of MI for each feature (should be the same for everything coming from the same contingency matrix)

    counts

    Counts of occurrence for categoricals (n x m array of arrays where n = number of labels and m = number of features + 1 with last element being occurrence count of labels

    Annotations
    @deprecated
    Deprecated

    (Since version 3.3.0) Functionality replaced by Array[CategoricalGroupStats]

Value Members

  1. object CorrelationExclusion extends Enum[CorrelationExclusion] with Serializable

    Permalink
  2. object CorrelationLevel extends Enum[CorrelationLevel] with Serializable

    Permalink
  3. object CorrelationType extends Enum[CorrelationType] with Serializable

    Permalink
  4. object DerivedFeatureFilter

    Permalink
  5. object DerivedFeatureFilterUtils

    Permalink
  6. object MinVarianceFilter extends Serializable

    Permalink
  7. object MinVarianceNames extends DerivedFeatureFilterNames with Product with Serializable

    Permalink
  8. object MinVarianceSummary extends DerivedFeatureFilterSummary with Product with Serializable

    Permalink
  9. object SanityChecker extends Serializable

    Permalink
  10. object SanityCheckerNames extends DerivedFeatureFilterNames with Product with Serializable

    Permalink

    Contains all names for sanity checker metadata

  11. object SanityCheckerSummary extends Product with Serializable

    Permalink

Ungrouped