expressions

package expressions

Ordering

Alphabetic

Visibility

Public
All

Type Members

abstract class Aggregator[-IN, BUF, OUT] extends Serializable
A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.
A base class for user-defined aggregations, which can be used in Dataset operations to take all of the elements of a group and reduce them to a single value.
For example, the following aggregator extracts an int from a specific class and adds them up:
```
case class Data(i: Int)

val customSummer =  new Aggregator[Data, Int, Int] {
  def zero: Int = 0
  def reduce(b: Int, a: Data): Int = b + a.i
  def merge(b1: Int, b2: Int): Int = b1 + b2
  def finish(r: Int): Int = r
  def bufferEncoder: Encoder[Int] = Encoders.scalaInt
  def outputEncoder: Encoder[Int] = Encoders.scalaInt
}.toColumn()

val ds: Dataset[Data] = ...
val aggregated = ds.select(customSummer)
```
Based loosely on Aggregator from Algebird: https://github.com/twitter/algebird
IN
The input type for the aggregation.
BUF
The type of the intermediate value of the reduction.
OUT
The type of the final output result.

Since
1.6.0
abstract class MutableAggregationBuffer extends Row
A Row representing a mutable aggregation buffer.
A Row representing a mutable aggregation buffer.
This is not meant to be extended outside of Spark.

Annotations
@Stable()
Since
1.5.0

sealed abstract class UserDefinedFunction extends AnyRef

A user-defined function.

A user-defined function. To create one, use the udf functions in functions.

As an example:

// Define a UDF that returns true or false based on some numeric score.
val predict = udf((score: Double) => score > 0.5)

// Projects a column that adds a prediction column based on the score column.
df.select( predict(df("score")) )

Annotations: @Stable()
Since: 1.3.0

class Window extends AnyRef

Utility functions for defining window in DataFrames.

// PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Window.partitionBy("country").orderBy("date")
  .rowsBetween(Window.unboundedPreceding, Window.currentRow)

// PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)

Annotations: @Stable()
Since: 1.4.0

class WindowSpec extends AnyRef
A window specification that defines the partitioning, ordering, and frame boundaries.
A window specification that defines the partitioning, ordering, and frame boundaries.
Use the static methods in Window to create a WindowSpec.

Annotations
@Stable()
Since
1.4.0
abstract class UserDefinedAggregateFunction extends Serializable
The base class for implementing user-defined aggregate functions (UDAF).
The base class for implementing user-defined aggregate functions (UDAF).

Annotations
@Stable() @deprecated
Deprecated
(Since version 3.0.0)
Since
1.5.0

Value Members

object Window
Utility functions for defining window in DataFrames.
Utility functions for defining window in DataFrames.
```
// PARTITION BY country ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
Window.partitionBy("country").orderBy("date")
  .rowsBetween(Window.unboundedPreceding, Window.currentRow)

// PARTITION BY country ORDER BY date ROWS BETWEEN 3 PRECEDING AND 3 FOLLOWING
Window.partitionBy("country").orderBy("date").rowsBetween(-3, 3)
```
Annotations
@Stable()
Since
1.4.0
Note
When ordering is not defined, an unbounded window frame (rowFrame, unboundedPreceding, unboundedFollowing) is used by default. When ordering is defined, a growing window frame (rangeFrame, unboundedPreceding, currentRow) is used by default.

Packages

expressions

package expressions

Type Members

Value Members

Ungrouped

Packages

expressions 

package expressions

Type Members

Value Members

Ungrouped

expressions