preparators

Type Members

case class CategoricalGroupStats(group: String, categoricalFeatures: Array[String], contingencyMatrix: Type, pointwiseMutualInfo: Type, cramersV: Double, mutualInfo: Double, maxRuleConfidences: Array[Double], supports: Array[Double]) extends MetadataLike with Product with Serializable

Container for categorical stats coming from a single group (and therefore a single contingency matrix)
Container for categorical stats coming from a single group (and therefore a single contingency matrix)
group
Indicator group for this contingency matrix
categoricalFeatures
Array of categorical features belonging to this group
contingencyMatrix
Contingency matrix for this feature group
pointwiseMutualInfo
Matrix of PMI values in Map form (label -> PMI values)
cramersV
Cramer's V value for this feature group (how strongly correlated is it with the label)
mutualInfo
Mutual info value for this feature group
maxRuleConfidences
Array (one value per contingency matrix row) containing the largest association rule confidence for that row (over all the labels)
supports
Array (one value per contingency matrix row) containing the supports for each categorical choice (fraction of dats in which it is chosen)
sealed trait CorrelationExclusion extends EnumEntry with Serializable

Categories of feature vector columns to exclude from the feature-label correlation matrix (or just array of feature-label correlations) calculated inSanityChecker.
sealed trait CorrelationLevel extends EnumEntry with Serializable

Settings for feature - feature correlations
sealed abstract class CorrelationType extends EnumEntry with Serializable

Represents a kind of correlation coefficient.
case class Correlations(featuresIn: Seq[String], valuesWithLabel: Seq[Double], valuesWithFeatures: Seq[Seq[Double]], corrType: CorrelationType) extends MetadataLike with Product with Serializable

Correlations between features and the label from SanityChecker
Correlations between features and the label from SanityChecker
featuresIn
names of features
valuesWithLabel
correlation of feature with label
valuesWithFeatures
correlations between features
corrType
type of correlation done on
trait DerivedFeatureFilterNames extends AnyRef
trait DerivedFeatureFilterParams extends Params
trait DerivedFeatureFilterSummary extends DerivedFeatureFilterNames
class MinVarianceFilter extends UnaryEstimator[OPVector, OPVector] with DerivedFeatureFilterParams

The MinVarianceFilter checks that computed features have a minimum variance
The MinVarianceFilter checks that computed features have a minimum variance
Like SanityChecker, the Estimator step outputs statistics on incoming data, as well as the names of features which should be dropped from the feature vector. And the transformer step applies the action of actually removing the low variance features from the feature vector
Two distinctions from SanityChecker: (1) no label column as input; and (2) only filters features by variance
final class MinVarianceFilterModel extends UnaryModel[OPVector, OPVector]
case class MinVarianceSummary(dropped: Seq[String], featuresStatistics: SummaryStatistics, names: Seq[String]) extends MetadataLike with Product with Serializable

Case class to store metadata from MinVarianceFilter
Case class to store metadata from MinVarianceFilter
dropped
features dropped by minimum variance filter
featuresStatistics
stats on features
names
names of features passed in
class PredictionDeIndexer extends BinaryEstimator[RealNN, RealNN, Text] with SaveOthersParams

Estimator which takes response feature and predinction feature as inputs.
Estimator which takes response feature and predinction feature as inputs. It deindexes the pred by using response's metadata
Input 1 : response Input 2 : pred feature
final class PredictionDeIndexerModel extends BinaryModel[RealNN, RealNN, Text]
class SanityChecker extends BinaryEstimator[RealNN, OPVector, OPVector] with SanityCheckerParams with AllowLabelAsInput[OPVector]

The SanityChecker checks for potential problems with computed features in a supervised learning setting.
The SanityChecker checks for potential problems with computed features in a supervised learning setting.
There is an Estimator step, which outputs statistics on the incoming data, as well as the names of features which should be dropped from the feature vector. The transformer step applies the action of actually removing the offending features from the feature vector.
final class SanityCheckerModel extends BinaryModel[RealNN, OPVector, OPVector] with AllowLabelAsInput[OPVector]
trait SanityCheckerParams extends DerivedFeatureFilterParams
case class SanityCheckerSummary(correlations: Correlations, dropped: Seq[String], featuresStatistics: SummaryStatistics, names: Seq[String], categoricalStats: Array[CategoricalGroupStats]) extends MetadataLike with Product with Serializable

Case class to convert to and from SanityChecker summary metadata
Case class to convert to and from SanityChecker summary metadata
correlations
feature correlations with label
dropped
features dropped for label leakage
featuresStatistics
stats on features
names
names of features passed in
case class SummaryStatistics(count: Double, sampleFraction: Double, max: Seq[Double], min: Seq[Double], mean: Seq[Double], variance: Seq[Double]) extends MetadataLike with Product with Serializable

Statistics on features (zip arrays with names in SanityCheckerSummary to get feature associated with values)
Statistics on features (zip arrays with names in SanityCheckerSummary to get feature associated with values)
count
count of data in sample used to calculate stats
sampleFraction
fraction of total data used in calculation
max
max value seen
min
min value
mean
mean value
variance
variance of value
case class CategoricalStats(categoricalFeatures: Array[String] = Array.empty, cramersVs: Array[Double] = Array.empty, pointwiseMutualInfos: Type = LabelWiseValues.empty, mutualInfos: Array[Double] = Array.empty, counts: Type = LabelWiseValues.empty) extends MetadataLike with Product with Serializable

Container class for statistics calculated from contingency tables constructed from categorical variables
Container class for statistics calculated from contingency tables constructed from categorical variables
categoricalFeatures
Names of features that we performed categorical tests on
cramersVs
Values of cramersV for each feature (should be the same for everything coming from the same contingency matrix)
pointwiseMutualInfos
Map from label value (as a string) to an Array (over features) of PMI values
mutualInfos
Values of MI for each feature (should be the same for everything coming from the same contingency matrix)
counts
Counts of occurrence for categoricals (n x m array of arrays where n = number of labels and m = number of features + 1 with last element being occurrence count of labels

Annotations
@deprecated
Deprecated
(Since version 3.3.0) Functionality replaced by Array[CategoricalGroupStats]

Value Members

object CorrelationExclusion extends Enum[CorrelationExclusion] with Serializable
object CorrelationLevel extends Enum[CorrelationLevel] with Serializable
object CorrelationType extends Enum[CorrelationType] with Serializable
object DerivedFeatureFilter
object DerivedFeatureFilterUtils
object MinVarianceFilter extends Serializable
object MinVarianceNames extends DerivedFeatureFilterNames with Product with Serializable
object MinVarianceSummary extends DerivedFeatureFilterSummary with Product with Serializable
object SanityChecker extends Serializable
object SanityCheckerNames extends DerivedFeatureFilterNames with Product with Serializable

Contains all names for sanity checker metadata
object SanityCheckerSummary extends Product with Serializable

package preparators

Type Members

case class CategoricalGroupStats(group: String, categoricalFeatures: Array[String], contingencyMatrix: Type, pointwiseMutualInfo: Type, cramersV: Double, mutualInfo: Double, maxRuleConfidences: Array[Double], supports: Array[Double]) extends MetadataLike with Product with Serializable

sealed trait CorrelationExclusion extends EnumEntry with Serializable

sealed trait CorrelationLevel extends EnumEntry with Serializable

sealed abstract class CorrelationType extends EnumEntry with Serializable

case class Correlations(featuresIn: Seq[String], valuesWithLabel: Seq[Double], valuesWithFeatures: Seq[Seq[Double]], corrType: CorrelationType) extends MetadataLike with Product with Serializable

trait DerivedFeatureFilterNames extends AnyRef

trait DerivedFeatureFilterParams extends Params

trait DerivedFeatureFilterSummary extends DerivedFeatureFilterNames

class MinVarianceFilter extends UnaryEstimator[OPVector, OPVector] with DerivedFeatureFilterParams

final class MinVarianceFilterModel extends UnaryModel[OPVector, OPVector]

case class MinVarianceSummary(dropped: Seq[String], featuresStatistics: SummaryStatistics, names: Seq[String]) extends MetadataLike with Product with Serializable

class PredictionDeIndexer extends BinaryEstimator[RealNN, RealNN, Text] with SaveOthersParams

final class PredictionDeIndexerModel extends BinaryModel[RealNN, RealNN, Text]

class SanityChecker extends BinaryEstimator[RealNN, OPVector, OPVector] with SanityCheckerParams with AllowLabelAsInput[OPVector]

final class SanityCheckerModel extends BinaryModel[RealNN, OPVector, OPVector] with AllowLabelAsInput[OPVector]

trait SanityCheckerParams extends DerivedFeatureFilterParams

case class SanityCheckerSummary(correlations: Correlations, dropped: Seq[String], featuresStatistics: SummaryStatistics, names: Seq[String], categoricalStats: Array[CategoricalGroupStats]) extends MetadataLike with Product with Serializable

case class SummaryStatistics(count: Double, sampleFraction: Double, max: Seq[Double], min: Seq[Double], mean: Seq[Double], variance: Seq[Double]) extends MetadataLike with Product with Serializable

Value Members

object CorrelationExclusion extends Enum[CorrelationExclusion] with Serializable

object CorrelationLevel extends Enum[CorrelationLevel] with Serializable

object CorrelationType extends Enum[CorrelationType] with Serializable

object DerivedFeatureFilter

object DerivedFeatureFilterUtils

object MinVarianceFilter extends Serializable

object MinVarianceNames extends DerivedFeatureFilterNames with Product with Serializable

object MinVarianceSummary extends DerivedFeatureFilterSummary with Product with Serializable

object SanityChecker extends Serializable

object SanityCheckerNames extends DerivedFeatureFilterNames with Product with Serializable

object SanityCheckerSummary extends Product with Serializable

Ungrouped