DataFrame

com.audienceproject.crossbow.DataFrame
See theDataFrame companion object
class DataFrame

Attributes

Companion
object
Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Type members

Classlikes

class GroupedView

Attributes

Supertypes
class Object
trait Matchable
class Any
class TypedView[T] extends Iterable[T]

Attributes

Supertypes
trait Iterable[T]
trait IterableFactoryDefaults[T, Iterable]
trait IterableOps[T, Iterable, Iterable[T]]
trait IterableOnceOps[T, Iterable, Iterable[T]]
trait IterableOnce[T]
class Object
trait Matchable
class Any
Show all

Value members

Concrete methods

def addColumn(expr: DataFrame ?=> Expr): DataFrame

Add a column to the DataFrame, evaluating to 'expr' at each individual row index. Use the 'as' method on Expr to give the column a name.

Add a column to the DataFrame, evaluating to 'expr' at each individual row index. Use the 'as' method on Expr to give the column a name.

Value parameters

expr

the Expr to evaluate as the new column

Attributes

Returns

new DataFrame

def apply(index: Int): Any

Retrieve a single row by index.

Retrieve a single row by index.

Value parameters

index

row index

Attributes

Returns

row as a Tuple

def apply(range: Range): DataFrame

Retrieve a subset of rows from this DataFrame based on range of indices.

Retrieve a subset of rows from this DataFrame based on range of indices.

Value parameters

range

range of row indices to retrieve

Attributes

Returns

new DataFrame

def apply(columnNames: String*): DataFrame

Select a subset of columns from this DataFrame.

Select a subset of columns from this DataFrame.

Value parameters

columnNames

names of columns to select

Attributes

Returns

new DataFrame

def as[T]: TypedView[T]

Typecast this DataFrame to a TypedView of the type parameter 'T'. All columns in this DataFrame will have to be accounted for in the given type. A DataFrame with multiple columns will have its rows represented as tuples of the individual types of these columns.

Typecast this DataFrame to a TypedView of the type parameter 'T'. All columns in this DataFrame will have to be accounted for in the given type. A DataFrame with multiple columns will have its rows represented as tuples of the individual types of these columns.

Type parameters

T

the type of a row in this DataFrame

Attributes

Returns

TypedView on the contents of this DataFrame

def explode(expr: DataFrame ?=> Expr): DataFrame

Explode this DataFrame on the given expression, flattening its contents and repeating all other cells on the row for every element in the sequence. The given Expr must evaluate to a list type. Use the 'as' method on Expr to name the flattened column.

Explode this DataFrame on the given expression, flattening its contents and repeating all other cells on the row for every element in the sequence. The given Expr must evaluate to a list type. Use the 'as' method on Expr to name the flattened column.

Value parameters

expr

the Expr to explode on

Attributes

Returns

new DataFrame

def filter(expr: DataFrame ?=> Expr): DataFrame

Retrieve a subset of rows from this DataFrame based on the boolean evaluation of the given expression.

Retrieve a subset of rows from this DataFrame based on the boolean evaluation of the given expression.

Value parameters

expr

the Expr to evaluate, if 'true' the given row will appear in the output

Attributes

Returns

new DataFrame

def groupBy(keyExprs: DataFrame ?=> Expr*): GroupedView

Partition this DataFrame into groups, defined by the given set of expressions. The evaluation of each of the 'keyExprs' will appear as a column in the output.

Partition this DataFrame into groups, defined by the given set of expressions. The evaluation of each of the 'keyExprs' will appear as a column in the output.

Value parameters

keyExprs

the list of com.audienceproject.crossbow.expr.Expr that will evaluate to the keys of the groups

Attributes

Returns

GroupedView on this DataFrame

def isEmpty: Boolean
def iterator: Iterator[Any]
def join(other: DataFrame, joinExpr: DataFrame ?=> Expr, joinType: JoinType): DataFrame

Join this DataFrame on another DataFrame, with the key evaluated by 'joinExpr'. The resulting DataFrame will contain all the columns of this DataFrame and the other, where the column names of the other will be prepended with "#".

Join this DataFrame on another DataFrame, with the key evaluated by 'joinExpr'. The resulting DataFrame will contain all the columns of this DataFrame and the other, where the column names of the other will be prepended with "#".

Value parameters

joinExpr

Expr to evaluate as join key

joinType

JoinType as one of Inner, FullOuter, LeftOuter or RightOuter

other

DataFrame to join with this one

Attributes

Returns

new DataFrame

Note

'joinExpr' must evaluate to a type with a natural ordering

def printSchema(): Unit
def removeColumns(columnNames: String*): DataFrame

Remove one or more columns from the DataFrame.

Remove one or more columns from the DataFrame.

Value parameters

columnNames

the names of the columns to remove

Attributes

Returns

new DataFrame

def renameColumns(newNames: String*): DataFrame

Rename the columns of this DataFrame.

Rename the columns of this DataFrame.

Value parameters

newNames

list of new names for each column of this DataFrame

Attributes

Returns

new DataFrame

def renameColumns(toNewName: String => String): DataFrame

Rename the columns of this DataFrame by applying the given function.

Rename the columns of this DataFrame by applying the given function.

Value parameters

toNewName

function to map over the names of the columns

Attributes

Returns

new DataFrame

def select(exprs: DataFrame ?=> Expr*): DataFrame

Map over this DataFrame, selecting a set of expressions which will become the columns of a new DataFrame. Use the 'as' method on Expr to give names to the new columns. An expression which is only a column accessor will inherit the accessed column's name (unless it is renamed).

Map over this DataFrame, selecting a set of expressions which will become the columns of a new DataFrame. Use the 'as' method on Expr to give names to the new columns. An expression which is only a column accessor will inherit the accessed column's name (unless it is renamed).

Value parameters

exprs

the list of Expr to evaluate as a new DataFrame

Attributes

Returns

new DataFrame

def sortBy(expr: DataFrame ?=> Expr, stable: Boolean)(using order: Order): DataFrame

Sort this DataFrame by the evaluation of 'expr'. If a natural ordering exists on this value, it will be used. User-defined orderings on other types or for overwriting the natural orderings with an explicit ordering can be supplied through the 'order' argument in the using clause.

Sort this DataFrame by the evaluation of 'expr'. If a natural ordering exists on this value, it will be used. User-defined orderings on other types or for overwriting the natural orderings with an explicit ordering can be supplied through the 'order' argument in the using clause.

Value parameters

expr

the Expr to evaluate as a sort key

order

explicit Order to use on the sort key

stable

whether the sorting should be stable or not - quicksort is used if not, else mergesort

Attributes

Returns

new DataFrame

def union(other: DataFrame): DataFrame

Union this DataFrame with another DataFrame. Columns will be matched by name, and if matched they must have the same type. Columns that are not present in one or the other DataFrame will contain null-values in the output for the rows of the DataFrame in which the column was not present.

Union this DataFrame with another DataFrame. Columns will be matched by name, and if matched they must have the same type. Columns that are not present in one or the other DataFrame will contain null-values in the output for the rows of the DataFrame in which the column was not present.

Value parameters

other

DataFrame to union with this one

Attributes

Returns

new DataFrame

Concrete fields

val numColumns: Int
val rowCount: Int
val schema: Schema