object StatsGenerator
Module managing FeatureStats Schema, Aggregations to be used by type and aggregator construction.
Stats Aggregation has an offline/ batch component and an online component. The metrics defined for stats depend on the schema of the join. The dataTypes and column names. For the online side, we obtain this information from the JoinCodec/valueSchema For the offline side, we obtain this information directly from the outputTable. To keep the schemas consistent we sort the metrics in the schema by name. (one column can have multiple metrics).
- Alphabetic
- By Inheritance
- StatsGenerator
- AnyRef
- Any
- Hide All
- Show All
- Public
- All
Type Members
-
case class
MetricTransform(name: String, expression: InputTransform, operation: Operation, suffix: String = "", argMap: Map[String, String] = null) extends Product with Serializable
MetricTransform represents a single statistic built on top of an input column.
Value Members
-
final
def
!=(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
final
def
##(): Int
- Definition Classes
- AnyRef → Any
-
final
def
==(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
SeriesFinalizer(key: String, value: AnyRef): AnyRef
Post processing for IRs when generating a time series of stats.
Post processing for IRs when generating a time series of stats. In the case of percentiles for examples we reduce to 5 values in order to generate candlesticks.
-
def
anyTransforms(column: String): Seq[MetricTransform]
Stats applied to any column
-
final
def
asInstanceOf[T0]: T0
- Definition Classes
- Any
- def buildAggPart(m: MetricTransform): AggregationPart
-
def
buildAggregator(metrics: Seq[MetricTransform], selectedSchema: StructType): RowAggregator
Build RowAggregator to use for computing stats on a dataframe based on metrics
-
def
buildMetrics(fields: Seq[(String, DataType)]): Seq[MetricTransform]
For the schema of the data define metrics to be aggregated
-
def
clone(): AnyRef
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
final
def
eq(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
def
equals(arg0: Any): Boolean
- Definition Classes
- AnyRef → Any
-
def
finalize(): Unit
- Attributes
- protected[lang]
- Definition Classes
- AnyRef
- Annotations
- @throws( classOf[java.lang.Throwable] )
- val finalizedPercentilesMerged: Array[Double]
- val finalizedPercentilesSeries: Array[Double]
-
final
def
getClass(): Class[_]
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
-
def
hashCode(): Int
- Definition Classes
- AnyRef → Any
- Annotations
- @native()
- val ignoreColumns: Seq[String]
-
final
def
isInstanceOf[T0]: Boolean
- Definition Classes
- Any
- def lInfKllSketch(sketch1: AnyRef, sketch2: AnyRef, bins: Int = 128): AnyRef
-
final
def
ne(arg0: AnyRef): Boolean
- Definition Classes
- AnyRef
-
final
def
notify(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
-
final
def
notifyAll(): Unit
- Definition Classes
- AnyRef
- Annotations
- @native()
- val nullRateSuffix: String
- val nullSuffix: String
-
def
numericTransforms(column: String): Seq[MetricTransform]
Stats applied to numeric columns
-
def
statsInputSchema(valueSchema: StructType): StructType
Input schema is the data required to update partial aggregations / stats.
Input schema is the data required to update partial aggregations / stats.
Given a valueSchema and a metric transform list, defines the schema expected by the Stats aggregator (online and offline)
-
def
statsIrSchema(valueSchema: StructType): StructType
A valueSchema (for join) and Metric list define uniquely the IRSchema to be used for the statistics.
A valueSchema (for join) and Metric list define uniquely the IRSchema to be used for the statistics. In order to support custom storage for statistic percentiles this method would need to be modified. IR Schemas are used to decode streaming partial aggregations as well as KvStore partial stats.
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
- Definition Classes
- AnyRef
-
def
toString(): String
- Definition Classes
- AnyRef → Any
- val totalColumn: String
-
final
def
wait(): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long, arg1: Int): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... )
-
final
def
wait(arg0: Long): Unit
- Definition Classes
- AnyRef
- Annotations
- @throws( ... ) @native()
-
object
InputTransform extends Enumeration
InputTransform acts as a signal of how to process the metric.
InputTransform acts as a signal of how to process the metric.
IsNull: Check if the input is null.
Raw: Operate in the input column.
One: lit(true) in spark. Used for row counts leveraged to obtain null rate values.