sql

Type Members

class Column extends AnyRef

:: Experimental :: A column in a DataFrame.
:: Experimental :: A column in a DataFrame.

Annotations
@Experimental()
class ColumnName extends Column

:: Experimental :: A convenient class used for constructing schema.
:: Experimental :: A convenient class used for constructing schema.

Annotations
@Experimental()

class DataFrame extends RDDApi[Row] with Serializable

:: Experimental :: A distributed collection of data organized into named columns.

A DataFrame is equivalent to a relational table in Spark SQL. There are multiple ways to create a DataFrame:

// Create a DataFrame from Parquet files
val people = sqlContext.parquetFile("...")

// Create a DataFrame from data sources
val df = sqlContext.load("...", "json")

Once created, it can be manipulated using the various domain-specific-language (DSL) functions defined in: DataFrame (this class), Column, and functions.

To select a column from the data frame, use apply method in Scala and col in Java.

val ageCol = people("age")  // in Scala
Column ageCol = people.col("age")  // in Java

Note that the Column type can also be manipulated through its various functions.

// The following creates a new column that increases everybody's age by 10.
people("age") + 10  // in Scala
people.col("age").plus(10);  // in Java

A more concrete example in Scala:

// To create DataFrame using SQLContext
val people = sqlContext.parquetFile("...")
val department = sqlContext.parquetFile("...")

people.filter("age" > 30)
  .join(department, people("deptId") === department("id"))
  .groupBy(department("name"), "gender")
  .agg(avg(people("salary")), max(people("age")))

and in Java:

// To create DataFrame using SQLContext
DataFrame people = sqlContext.parquetFile("...");
DataFrame department = sqlContext.parquetFile("...");

people.filter("age".gt(30))
  .join(department, people.col("deptId").equalTo(department("id")))
  .groupBy(department.col("name"), "gender")
  .agg(avg(people.col("salary")), max(people.col("age")));

Annotations: @Experimental()

class ExperimentalMethods extends AnyRef

:: Experimental :: Holder for experimental methods for the bravest.
:: Experimental :: Holder for experimental methods for the bravest. We make NO guarantee about the stability regarding binary compatibility and source compatibility of methods here.
```
sqlContext.experimental.extraStrategies += ...
```
Annotations
@Experimental()
class GroupedData extends AnyRef

:: Experimental :: A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy.
:: Experimental :: A set of methods for aggregations on a DataFrame, created by DataFrame.groupBy.

Annotations
@Experimental()
class SQLContext extends Logging with Serializable

The entry point for working with structured data (rows and columns) in Spark.
The entry point for working with structured data (rows and columns) in Spark. Allows the creation of DataFrame objects as well as the execution of SQL queries.
class SaveMode extends Enum[SaveMode]
type Strategy = GenericStrategy[SparkPlan]

Converts a logical plan into zero or more SparkPlans.
Converts a logical plan into zero or more SparkPlans. This API is exposed for experimenting with the query planner and is not designed to be stable across spark releases. Developers writing libraries should instead consider using the stable APIs provided in org.apache.spark.sql.sources

Annotations
@DeveloperApi()
class UDFRegistration extends Logging

Functions for registering user-defined functions.
Functions for registering user-defined functions. Use SQLContext.udf to access this.

case class UserDefinedFunction(f: AnyRef, dataType: DataType) extends Product with Serializable

A user-defined function.

A user-defined function. To create one, use the udf functions in functions. As an example:

// Defined a UDF that returns true or false based on some numeric score.
val predict = udf((score: Double) => if (score > 0.5) true else false)

// Projects a column that adds a prediction column based on the score column.
df.select( predict(df("score")) )

type SchemaRDD = DataFrame

Type alias for DataFrame.
Type alias for DataFrame. Kept here for backward source compatibility for Scala.

Annotations
@deprecated
Deprecated
(Since version use DataFrame) 1.3.0

Value Members

package api

Contains API classes that are specific to a single language (i.e.
Contains API classes that are specific to a single language (i.e. Java).
package execution

:: DeveloperApi :: An execution engine for relational query plans that runs on top Spark and returns RDDs.
:: DeveloperApi :: An execution engine for relational query plans that runs on top Spark and returns RDDs.
Note that the operators in this package are created automatically by a query planner using a SQLContext and are not intended to be used directly by end users of Spark SQL. They are documented here in order to make it easier for others to understand the performance characteristics of query plans that are generated by Spark SQL.
object functions

:: Experimental :: Functions available for DataFrame.
:: Experimental :: Functions available for DataFrame.

Annotations
@Experimental()
package sources

A set of APIs for adding data sources to Spark SQL.
package test

package sql

Type Members

class Column extends AnyRef

class ColumnName extends Column

class DataFrame extends RDDApi[Row] with Serializable

class ExperimentalMethods extends AnyRef

class GroupedData extends AnyRef

class SQLContext extends Logging with Serializable

class SaveMode extends Enum[SaveMode]

type Strategy = GenericStrategy[SparkPlan]

class UDFRegistration extends Logging

case class UserDefinedFunction(f: AnyRef, dataType: DataType) extends Product with Serializable

type SchemaRDD = DataFrame

Value Members

package api

package execution

object functions

package sources

package test

Inherited from AnyRef

Inherited from Any

Ungrouped