Class/Object

com.salesforce.op.stages.impl.feature

OpOneHotVectorizer

Related Docs: object OpOneHotVectorizer | package feature

Permalink

abstract class OpOneHotVectorizer[T <: FeatureType] extends SequenceEstimator[T, OPVector] with PivotParams with CleanTextFun with SaveOthersParams with TrackNullsParam with MinSupportParam with OneHotFun with MaxPctCardinalityParams

Converts a sequence of features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs). Plus an additional column for "other" values - which will capture values that do not make the cut or values not seen in training, and an additional column for empty values unless null tracking is disabled.

Linear Supertypes
MaxPctCardinalityParams, OneHotFun, UniqueCountFun, MinSupportParam, TrackNullsParam, SaveOthersParams, CleanTextFun, PivotParams, TextParams, SequenceEstimator[T, OPVector], OpPipelineStageN[T, OPVector], HasOut[OPVector], HasInN, OpPipelineStage[OPVector], OpPipelineStageBase, MLWritable, OpPipelineStageParams, InputParams, Estimator[SequenceModel[T, OPVector]], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. OpOneHotVectorizer
  2. MaxPctCardinalityParams
  3. OneHotFun
  4. UniqueCountFun
  5. MinSupportParam
  6. TrackNullsParam
  7. SaveOthersParams
  8. CleanTextFun
  9. PivotParams
  10. TextParams
  11. SequenceEstimator
  12. OpPipelineStageN
  13. HasOut
  14. HasInN
  15. OpPipelineStage
  16. OpPipelineStageBase
  17. MLWritable
  18. OpPipelineStageParams
  19. InputParams
  20. Estimator
  21. PipelineStage
  22. Logging
  23. Params
  24. Serializable
  25. Serializable
  26. Identifiable
  27. AnyRef
  28. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new OpOneHotVectorizer(operationName: String, uid: String = UID[OpOneHotVectorizer[_]])(implicit tti: scala.reflect.api.JavaUniverse.TypeTag[T], ttiv: scala.reflect.api.JavaUniverse.TypeTag[T.Value])

    Permalink

    operationName

    unique name of the operation this stage performs

    uid

    uid for instance

Type Members

  1. final type InputFeatures = Array[FeatureLike[T]]

    Permalink
    Definition Classes
    OpPipelineStageN → OpPipelineStage → InputParams
  2. final type OutputFeatures = FeatureLike[OPVector]

    Permalink
    Definition Classes
    OpPipelineStage → OpPipelineStageBase

Abstract Value Members

  1. abstract def convertToSeqOfMaps(dataset: Dataset[Seq[T.Value]]): RDD[Seq[Map[String, Int]]]

    Permalink
    Attributes
    protected
  2. abstract def makeModel(topValues: Seq[Seq[String]], shouldCleanText: Boolean, shouldTrackNulls: Boolean, operationName: String, uid: String): SequenceModel[T, OPVector]

    Permalink
    Attributes
    protected

Concrete Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. final def checkInputLength(features: Array[_]): Boolean

    Permalink
    Definition Classes
    OpPipelineStageN → InputParams
  7. final def checkSerializable: Try[Unit]

    Permalink
    Definition Classes
    SequenceEstimator → OpPipelineStageBase
  8. final val cleanText: BooleanParam

    Permalink
    Definition Classes
    TextParams
  9. def cleanTextFn(s: String, shouldClean: Boolean): String

    Permalink
    Definition Classes
    CleanTextFun
  10. final def clear(param: Param[_]): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    Params
  11. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  12. final def copy(extra: ParamMap): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    OpPipelineStageBase → Params
  13. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  14. def countMapUniques[V](dataset: Dataset[Seq[Map[String, V]]], size: Int, bits: Int)(implicit kryo: KryoSerializer, ct: ClassTag[V]): (Seq[Map[String, HLL]], Long)

    Permalink

    Count unique values of each of the sequence & map key components in the dataset using HyperLogLog HLL

    Count unique values of each of the sequence & map key components in the dataset using HyperLogLog HLL

    V

    value type

    dataset

    dataset to count unique values

    size

    size of each sequence component

    bits

    number of bits for HyperLogLog HLL

    kryo

    kryo serializer to serialize V value into array of bytes

    ct

    class tag of V - needed by kryo

    returns

    HyperLogLog HLL of unique values count for each of the sequence components and total rows count

    Definition Classes
    UniqueCountFun
  15. def countUniques[V](dataset: Dataset[Seq[V]], size: Int, bits: Int)(implicit kryo: KryoSerializer, ct: ClassTag[V]): (Seq[HLL], Long)

    Permalink

    Count unique values of each of the sequence components in the dataset using HyperLogLog HLL

    Count unique values of each of the sequence components in the dataset using HyperLogLog HLL

    V

    value type

    dataset

    dataset to count unique values

    size

    size of each sequence component

    bits

    number of bits for HyperLogLog HLL

    kryo

    kryo serializer to serialize V value into array of bytes

    ct

    class tag of V - needed by kryo

    returns

    HyperLogLog HLL of unique values count for each of the sequence components and total rows count

    Definition Classes
    UniqueCountFun
  16. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  17. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  18. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  19. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  20. def explainParams(): String

    Permalink
    Definition Classes
    Params
  21. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  22. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  23. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  24. def fit(dataset: Dataset[_]): SequenceModel[T, OPVector]

    Permalink
    Definition Classes
    SequenceEstimator → Estimator
  25. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[SequenceModel[T, OPVector]]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  26. def fit(dataset: Dataset[_], paramMap: ParamMap): SequenceModel[T, OPVector]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  27. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): SequenceModel[T, OPVector]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  28. def fitFn(dataset: Dataset[Seq[T.Value]]): SequenceModel[T, OPVector]

    Permalink
    Definition Classes
    OpOneHotVectorizer → SequenceEstimator
  29. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  30. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  31. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  32. final def getHLLBits: Int

    Permalink
    Definition Classes
    MaxPctCardinalityParams
  33. final def getInputFeature[T <: FeatureType](i: Int): Option[FeatureLike[T]]

    Permalink
    Definition Classes
    InputParams
  34. final def getInputFeatures(): Array[OPFeature]

    Permalink
    Definition Classes
    InputParams
  35. final def getInputSchema(): StructType

    Permalink
    Definition Classes
    OpPipelineStageParams
  36. final def getMaxPctCardinality: Double

    Permalink
    Definition Classes
    MaxPctCardinalityParams
  37. final def getMetadata(): Metadata

    Permalink
    Definition Classes
    OpPipelineStageParams
  38. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  39. def getOutput(): FeatureLike[OPVector]

    Permalink
    Definition Classes
    HasOut → OpPipelineStageBase
  40. final def getOutputFeatureName: String

    Permalink
    Definition Classes
    OpPipelineStage
  41. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  42. final def getTransientFeature(i: Int): Option[TransientFeature]

    Permalink
    Definition Classes
    InputParams
  43. final def getTransientFeatures(): Array[TransientFeature]

    Permalink
    Definition Classes
    InputParams
  44. def getUnseenName: String

    Permalink
    Definition Classes
    SaveOthersParams
  45. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  46. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  47. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  48. final val hllBits: IntParam

    Permalink
    Definition Classes
    MaxPctCardinalityParams
  49. final def inN: Array[TransientFeature]

    Permalink
    Attributes
    protected
    Definition Classes
    HasInN
  50. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  51. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  52. final def inputAsArray(in: InputFeatures): Array[OPFeature]

    Permalink
    Definition Classes
    OpPipelineStageN → InputParams
  53. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  54. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  55. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  56. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  57. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  58. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  59. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  60. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  61. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  62. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  63. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  64. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  65. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  66. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  67. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  68. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  69. def makeVectorColumnMetadata(shouldTrackNulls: Boolean, unseen: Option[String], topValues: Seq[Seq[String]], features: Array[TransientFeature]): Array[OpVectorColumnMetadata]

    Permalink
    Attributes
    protected
    Definition Classes
    OneHotFun
  70. def makeVectorMetadata(shouldTrackNulls: Boolean, unseen: Option[String], topValues: Seq[Seq[String]], outputName: String, features: Array[TransientFeature], stageName: String): OpVectorMetadata

    Permalink
    Attributes
    protected
    Definition Classes
    OneHotFun
  71. final val maxPctCardinality: DoubleParam

    Permalink
    Definition Classes
    MaxPctCardinalityParams
  72. final val minSupport: IntParam

    Permalink
    Definition Classes
    MinSupportParam
  73. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  74. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  75. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  76. def onGetMetadata(): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    OpPipelineStageParams
  77. def onSetInput(): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    InputParams
  78. val operationName: String

    Permalink

    unique name of the operation this stage performs

    unique name of the operation this stage performs

    Definition Classes
    SequenceEstimator → OpPipelineStageBase
  79. final def outputAsArray(out: OutputFeatures): Array[OPFeature]

    Permalink
    Definition Classes
    OpPipelineStage → OpPipelineStageBase
  80. def outputFeatureUid: String

    Permalink
    Attributes
    protected[com.salesforce.op]
    Definition Classes
    OpPipelineStageN → OpPipelineStage
  81. def outputIsResponse: Boolean

    Permalink
    Definition Classes
    OpPipelineStage
  82. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  83. def save(path: String): Unit

    Permalink
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  84. val seqIConvert: FeatureTypeSparkConverter[T]

    Permalink
    Definition Classes
    SequenceEstimator
  85. implicit val seqIEncoder: Encoder[Seq[T.Value]]

    Permalink
    Definition Classes
    SequenceEstimator
  86. final def set(paramPair: ParamPair[_]): OpOneHotVectorizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  87. final def set(param: String, value: Any): OpOneHotVectorizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  88. final def set[T](param: Param[T], value: T): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    Params
  89. def setCleanText(clean: Boolean): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    TextParams
  90. final def setDefault(paramPairs: ParamPair[_]*): OpOneHotVectorizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  91. final def setDefault[T](param: Param[T], value: T): OpOneHotVectorizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  92. final def setHLLBits(value: Int): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    MaxPctCardinalityParams
  93. final def setInput(features: FeatureLike[T]*): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    OpPipelineStageN
  94. final def setInput(features: InputFeatures): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    OpPipelineStageBase
  95. final def setInputFeatures[S <: OPFeature](features: Array[S]): OpOneHotVectorizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    InputParams
  96. final def setMaxPctCardinality(v: Double): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    MaxPctCardinalityParams
  97. final def setMetadata(m: Metadata): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    OpPipelineStageParams
  98. def setMinSupport(min: Int): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    MinSupportParam
  99. def setOutputFeatureName(name: String): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    OpPipelineStage
  100. def setTopK(numberToKeep: Int): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    PivotParams
  101. def setTrackNulls(v: Boolean): OpOneHotVectorizer.this.type

    Permalink

    Option to keep track of values that were missing

    Option to keep track of values that were missing

    Definition Classes
    TrackNullsParam
  102. def setUnseenName(unseenNameIn: String): OpOneHotVectorizer.this.type

    Permalink
    Definition Classes
    SaveOthersParams
  103. final def stageName: String

    Permalink
    Definition Classes
    OpPipelineStageBase
  104. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  105. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  106. final val topK: IntParam

    Permalink
    Definition Classes
    PivotParams
  107. final val trackNulls: BooleanParam

    Permalink
    Definition Classes
    TrackNullsParam
  108. final def transformSchema(schema: StructType): StructType

    Permalink
    Definition Classes
    OpPipelineStageBase
  109. def transformSchema(schema: StructType, logging: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  110. implicit val tti: scala.reflect.api.JavaUniverse.TypeTag[T]

    Permalink
    Definition Classes
    SequenceEstimator
  111. implicit val ttiv: scala.reflect.api.JavaUniverse.TypeTag[T.Value]

    Permalink
    Definition Classes
    SequenceEstimator
  112. implicit val tto: scala.reflect.api.JavaUniverse.TypeTag[OPVector]

    Permalink
    Definition Classes
    SequenceEstimator → HasOut
  113. implicit val ttov: scala.reflect.api.JavaUniverse.TypeTag[Value]

    Permalink
    Definition Classes
    SequenceEstimator → HasOut
  114. val uid: String

    Permalink

    uid for instance

    uid for instance

    Definition Classes
    SequenceEstimator → Identifiable
  115. final val unseenName: Param[String]

    Permalink
    Definition Classes
    SaveOthersParams
  116. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  117. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  118. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  119. final def write: MLWriter

    Permalink
    Definition Classes
    OpPipelineStageBase → MLWritable

Inherited from MaxPctCardinalityParams

Inherited from OneHotFun

Inherited from UniqueCountFun

Inherited from MinSupportParam

Inherited from TrackNullsParam

Inherited from SaveOthersParams

Inherited from CleanTextFun

Inherited from PivotParams

Inherited from TextParams

Inherited from SequenceEstimator[T, OPVector]

Inherited from OpPipelineStageN[T, OPVector]

Inherited from HasOut[OPVector]

Inherited from HasInN

Inherited from OpPipelineStage[OPVector]

Inherited from OpPipelineStageBase

Inherited from MLWritable

Inherited from OpPipelineStageParams

Inherited from InputParams

Inherited from Estimator[SequenceModel[T, OPVector]]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Ungrouped