Class/Object

com.salesforce.op.stages.impl.tuning

DataBalancer

Related Docs: object DataBalancer | package tuning

Permalink

class DataBalancer extends Splitter with DataBalancerParams

Instance that will split the data into train and holdout and then balance the dataset before modeling binary classifications

Linear Supertypes
DataBalancerParams, Splitter, SplitterParams, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataBalancer
  2. DataBalancerParams
  3. Splitter
  4. SplitterParams
  5. Params
  6. Serializable
  7. Serializable
  8. Identifiable
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DataBalancer(uid: String = UID[DataBalancer])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def checkPreconditions(): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Splitter
  7. final def clear(param: Param[_]): DataBalancer.this.type

    Permalink
    Definition Classes
    Params
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def copy(extra: ParamMap): DataBalancer

    Permalink
    Definition Classes
    DataBalancer → Params
  10. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  11. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  12. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  14. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  15. def explainParams(): String

    Permalink
    Definition Classes
    Params
  16. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  17. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  18. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  19. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  20. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  21. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  22. def getMaxTrainingSample: Int

    Permalink
    Definition Classes
    SplitterParams
  23. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  24. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  25. def getProportions(smallCount: Double, bigCount: Double, sampleF: Double, maxTrainingSample: Int): (Double, Double)

    Permalink

    Computes the upSample and downSample proportions.

    Computes the upSample and downSample proportions.

    smallCount

    size of minority class data

    bigCount

    size of majority class data

    sampleF

    targeted fraction of small data

    maxTrainingSample

    maximum training size

    returns

    downSample & upSample proportions

  26. def getReserveTestFraction: Double

    Permalink
    Definition Classes
    SplitterParams
  27. def getSampleFraction: Double

    Permalink
    Definition Classes
    DataBalancerParams
  28. def getSeed: Long

    Permalink
    Definition Classes
    SplitterParams
  29. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  30. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  31. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  32. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  33. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  34. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  35. final val labelColumnName: Param[String]

    Permalink
    Definition Classes
    SplitterParams
  36. final val maxTrainingSample: IntParam

    Permalink

    Maximum size of dataset want to train on.

    Maximum size of dataset want to train on. Value should be > 0. Default is 1000000.

    Definition Classes
    SplitterParams
  37. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  38. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  39. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  40. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  41. def preValidationPrepare(data: Dataset[Row]): PrevalidationVal

    Permalink

    Function to set parameters before passing into the validation step eg - do data balancing or dropping based on the labels

    Function to set parameters before passing into the validation step eg - do data balancing or dropping based on the labels

    returns

    Parameters set in examining data

    Definition Classes
    DataBalancerSplitter
  42. final val reserveTestFraction: DoubleParam

    Permalink

    Fraction of data to reserve for test Default is 0.1

    Fraction of data to reserve for test Default is 0.1

    Definition Classes
    SplitterParams
  43. final val sampleFraction: DoubleParam

    Permalink

    Targeted sample fraction for the class in minority.

    Targeted sample fraction for the class in minority. Value should be in ]0.0, 1.0[ Default is 0.1.

    Definition Classes
    DataBalancerParams
  44. final val seed: LongParam

    Permalink

    Seed for data splitting

    Seed for data splitting

    Definition Classes
    SplitterParams
  45. final def set(paramPair: ParamPair[_]): DataBalancer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  46. final def set(param: String, value: Any): DataBalancer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  47. final def set[T](param: Param[T], value: T): DataBalancer.this.type

    Permalink
    Definition Classes
    Params
  48. final def setDefault(paramPairs: ParamPair[_]*): DataBalancer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  49. final def setDefault[T](param: Param[T], value: T): DataBalancer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  50. def setMaxTrainingSample(value: Int): DataBalancer.this.type

    Permalink
    Definition Classes
    SplitterParams
  51. def setReserveTestFraction(value: Double): DataBalancer.this.type

    Permalink
    Definition Classes
    SplitterParams
  52. def setSampleFraction(value: Double): DataBalancer.this.type

    Permalink
    Definition Classes
    DataBalancerParams
  53. def setSeed(value: Long): DataBalancer.this.type

    Permalink
    Definition Classes
    SplitterParams
  54. def split[T](data: Dataset[T]): (Dataset[T], Dataset[T])

    Permalink

    Function to use to create the training set and test set.

    Function to use to create the training set and test set.

    returns

    (dataTrain, dataTest)

    Definition Classes
    Splitter
  55. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  56. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  57. val uid: String

    Permalink
    Definition Classes
    Splitter → Identifiable
  58. def validationPrepare(data: Dataset[Row]): Dataset[Row]

    Permalink

    Rebalance the training data within the validation step

    Rebalance the training data within the validation step

    data

    to prepare for model training. first column must be the label as a double

    returns

    balanced training set and a test set

    Definition Classes
    DataBalancerSplitter
  59. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  60. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  61. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  62. def withLabelColumnName(label: String): Splitter

    Permalink

    Add a splitter parameter to name the label column

    Add a splitter parameter to name the label column

    Definition Classes
    Splitter

Inherited from DataBalancerParams

Inherited from Splitter

Inherited from SplitterParams

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

param

Ungrouped