:: DeveloperApi ::
Groups input data by groupingExpressions
and computes the aggregateExpressions
for each
group.
:: DeveloperApi ::
Groups input data by groupingExpressions
and computes the aggregateExpressions
for each
group.
if true then aggregation is done partially on local data without shuffling to
ensure all values where groupingExpressions
are equal are present.
expressions that are evaluated to determine grouping.
expressions that are computed for each group.
the input data source.
:: DeveloperApi :: Uses PythonRDD to evaluate a PythonUDF, one partition of tuples at a time.
:: DeveloperApi :: Uses PythonRDD to evaluate a PythonUDF, one partition of tuples at a time. The input data is cached and zipped with the result of the udf evaluation.
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi :: Computes the set of distinct input rows using a HashSet.
:: DeveloperApi :: Computes the set of distinct input rows using a HashSet.
when true the distinct operation is performed partially, per partition, without shuffling the data.
the input query plan.
:: DeveloperApi :: Evaluates a PythonUDF, appending the result to the end of the input tuple.
:: DeveloperApi :: Evaluates a PythonUDF, appending the result to the end of the input tuple.
:: DeveloperApi :: Returns a table with the elements from left that are not in right using the built-in spark subtract function.
:: DeveloperApi :: Returns a table with the elements from left that are not in right using the built-in spark subtract function.
:: DeveloperApi ::
:: DeveloperApi ::
An explain command for users to see how a command will be executed.
An explain command for users to see how a command will be executed.
Note that this command takes in a logical plan, runs the optimizer on the logical plan (but do NOT actually execute it).
:: DeveloperApi ::
:: DeveloperApi :: Performs a sort, spilling to disk as needed.
:: DeveloperApi :: Performs a sort, spilling to disk as needed.
when true performs a global sort of all partitions by shuffling the data first if necessary.
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi :: Applies a Generator to a stream of input rows, combining the output of each into a new stream of rows.
:: DeveloperApi ::
Applies a Generator to a stream of input rows, combining the
output of each into a new stream of rows. This operation is similar to a flatMap
in functional
programming with one important additional feature, which allows the input rows to be joined with
their output.
when true, each output row is implicitly joined with the input tuple that produced it.
when true, each input row will be output at least once, even if the output of the
given generator
is empty. outer
has no effect when join
is false.
:: DeveloperApi :: Alternate version of aggregation that leverages projection and thus code generation.
:: DeveloperApi :: Alternate version of aggregation that leverages projection and thus code generation. Aggregations are converted into a set of projections from a aggregation buffer tuple back onto itself. Currently only used for simple aggregations like SUM, COUNT, or AVERAGE are supported.
if true then aggregation is done partially on local data without shuffling to
ensure all values where groupingExpressions
are equal are present.
expressions that are evaluated to determine grouping.
expressions that are computed for each group.
the input data source.
:: DeveloperApi :: Returns the rows in left that also appear in right using the built in spark intersection function.
:: DeveloperApi :: Returns the rows in left that also appear in right using the built in spark intersection function.
:: DeveloperApi :: Take the first limit elements.
:: DeveloperApi :: Take the first limit elements. Note that the implementation is different depending on whether this is a terminal operator or not. If it is terminal and is invoked using executeCollect, this operator uses something similar to Spark's take method on the Spark driver. If it is not terminal or is invoked using execute, we first take the limit on each partition, and then repartition all the data to a single partition to compute the global limit.
:: DeveloperApi :: A plan node that does nothing but lie about the output of its child.
:: DeveloperApi :: A plan node that does nothing but lie about the output of its child. Used to spice a (hopefully structurally equivalent) tree from a different optimization sequence into an already resolved tree.
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi :: Performs a sort on-heap.
:: DeveloperApi :: Performs a sort on-heap.
when true performs a global sort of all partitions by shuffling the data first if necessary.
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi :: Take the first limit elements as defined by the sortOrder.
:: DeveloperApi :: Take the first limit elements as defined by the sortOrder. This is logically equivalent to having a Limit operator after a Sort operator. This could have been named TopK, but Spark's top operator does the opposite in ordering so we name it TakeOrdered to avoid confusion.
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi ::
(Since version 1.2.0) Use LogicalRDD
(Since version 1.2.0) Use LogicalRDD
:: DeveloperApi ::
:: DeveloperApi ::
:: DeveloperApi :: Contains methods for debugging query execution.
:: DeveloperApi :: Contains methods for debugging query execution.
Usage:
sql("SELECT key FROM src").debug
:: DeveloperApi :: Physical execution operators for join operations.
:: DeveloperApi :: An execution engine for relational query plans that runs on top Spark and returns RDDs.
Note that the operators in this package are created automatically by a query planner using a SQLContext and are not intended to be used directly by end users of Spark SQL. They are documented here in order to make it easier for others to understand the performance characteristics of query plans that are generated by Spark SQL.