package adaptive
- Alphabetic
- Public
- All
Type Members
-
case class
AdaptiveSparkPlanExec(initialPlan: SparkPlan, session: SparkSession, preprocessingRules: Seq[Rule[SparkPlan]], subqueryCache: TrieMap[SparkPlan, BaseSubqueryExec], stageCache: TrieMap[SparkPlan, QueryStageExec], queryExecution: QueryExecution) extends SparkPlan with LeafExecNode with Product with Serializable
A root node to execute the query plan adaptively.
A root node to execute the query plan adaptively. It splits the query plan into independent stages and executes them in order according to their dependencies. The query stage materializes its output at the end. When one stage completes, the data statistics of the materialized output will be used to optimize the remainder of the query.
To create query stages, we traverse the query tree bottom up. When we hit an exchange node, and if all the child query stages of this exchange node are materialized, we create a new query stage for this exchange node. The new stage is then materialized asynchronously once it is created.
When one query stage finishes materialization, the rest query is re-optimized and planned based on the latest statistics provided by all materialized stages. Then we traverse the query plan again and create more stages if possible. After all stages have been materialized, we execute the rest of the plan.
-
trait
AdaptiveSparkPlanHelper extends AnyRef
This class provides utility methods related to tree traversal of an AdaptiveSparkPlanExec plan.
This class provides utility methods related to tree traversal of an AdaptiveSparkPlanExec plan. Unlike their counterparts in org.apache.spark.sql.catalyst.trees.TreeNode or org.apache.spark.sql.catalyst.plans.QueryPlan, these methods traverse down leaf nodes of adaptive plans, i.e., AdaptiveSparkPlanExec and QueryStageExec.
-
case class
BroadcastQueryStageExec(id: Int, plan: BroadcastExchangeExec) extends QueryStageExec with Product with Serializable
A broadcast query stage whose child is a BroadcastExchangeExec.
-
trait
Cost extends Ordered[Cost]
Represents the cost of a plan.
-
trait
CostEvaluator extends AnyRef
Evaluates the cost of a physical plan.
-
case class
DemoteBroadcastHashJoin(conf: SQLConf) extends Rule[LogicalPlan] with Product with Serializable
This optimization rule detects a join child that has a high ratio of empty partitions and adds a no-broadcast-hash-join hint to avoid it being broadcast.
-
case class
InsertAdaptiveSparkPlan(session: SparkSession, queryExecution: QueryExecution) extends Rule[SparkPlan] with Product with Serializable
This rule wraps the query plan with an AdaptiveSparkPlanExec, which executes the query plan and re-optimize the plan during execution based on runtime data statistics.
This rule wraps the query plan with an AdaptiveSparkPlanExec, which executes the query plan and re-optimize the plan during execution based on runtime data statistics.
Note that this rule is stateful and thus should not be reused across query executions.
- case class LocalShuffleReaderExec(child: QueryStageExec) extends SparkPlan with UnaryExecNode with Product with Serializable
-
class
LocalShuffledRowRDD extends RDD[InternalRow]
This is a specialized version of org.apache.spark.sql.execution.ShuffledRowRDD.
This is a specialized version of org.apache.spark.sql.execution.ShuffledRowRDD. This is used in Spark SQL adaptive execution when a shuffle join is converted to broadcast join at runtime because the map output of one input table is small enough for broadcast. This RDD represents the data of another input table of the join that reads from shuffle. Each partition of the RDD reads the whole data from just one mapper output locally. So actually there is no data transferred from the network.
This RDD takes a ShuffleDependency (
dependency
).The
dependency
has the parent RDD of this RDD, which represents the dataset before shuffle (i.e. map output). Elements of this RDD are (partitionId, Row) pairs. Partition ids should be in the range [0, numPartitions - 1].dependency.partitioner.numPartitions
is the number of pre-shuffle partitions. (i.e. the number of partitions of the map output). The post-shuffle partition number is the same to the parent RDD's partition number. -
case class
LogicalQueryStage(logicalPlan: LogicalPlan, physicalPlan: SparkPlan) extends LeafNode with Product with Serializable
The LogicalPlan wrapper for a QueryStageExec, or a snippet of physical plan containing a QueryStageExec, in which all ancestor nodes of the QueryStageExec are linked to the same logical node.
The LogicalPlan wrapper for a QueryStageExec, or a snippet of physical plan containing a QueryStageExec, in which all ancestor nodes of the QueryStageExec are linked to the same logical node.
For example, a logical Aggregate can be transformed into FinalAgg - Shuffle - PartialAgg, in which the Shuffle will be wrapped into a QueryStageExec, thus the LogicalQueryStage will have FinalAgg - QueryStageExec as its physical plan.
- case class OptimizeLocalShuffleReader(conf: SQLConf) extends Rule[SparkPlan] with Product with Serializable
- case class PlanAdaptiveSubqueries(subqueryMap: Map[Long, ExecSubqueryExpression]) extends Rule[SparkPlan] with Product with Serializable
-
abstract
class
QueryStageExec extends SparkPlan with LeafExecNode
A query stage is an independent subgraph of the query plan.
A query stage is an independent subgraph of the query plan. Query stage materializes its output before proceeding with further operators of the query plan. The data statistics of the materialized output can be used to optimize subsequent query stages.
There are 2 kinds of query stages:
- Shuffle query stage. This stage materializes its output to shuffle files, and Spark launches another job to execute the further operators. 2. Broadcast query stage. This stage materializes its output to an array in driver JVM. Spark broadcasts the array before executing the further operators.
- case class ReuseAdaptiveSubquery(conf: SQLConf, reuseMap: TrieMap[SparkPlan, BaseSubqueryExec]) extends Rule[SparkPlan] with Product with Serializable
-
case class
ReusedQueryStageExec(id: Int, plan: QueryStageExec, output: Seq[Attribute]) extends QueryStageExec with Product with Serializable
A wrapper for reused query stage to have different output.
-
case class
ShuffleQueryStageExec(id: Int, plan: ShuffleExchangeExec) extends QueryStageExec with Product with Serializable
A shuffle query stage whose child is a ShuffleExchangeExec.
-
case class
SimpleCost(value: Long) extends Cost with Product with Serializable
A simple implementation of Cost, which takes a number of Long as the cost value.
-
case class
StageFailure(stage: QueryStageExec, error: Throwable) extends StageMaterializationEvent with Product with Serializable
The materialization of a query stage hit an error and failed.
-
sealed
trait
StageMaterializationEvent extends AnyRef
The event type for stage materialization.
-
case class
StageSuccess(stage: QueryStageExec, result: Any) extends StageMaterializationEvent with Product with Serializable
The materialization of a query stage completed with success.
Value Members
- object AdaptiveSparkPlanExec extends Serializable
- object BroadcastQueryStageExec extends Serializable
-
object
LogicalQueryStageStrategy extends Strategy with PredicateHelper
Strategy for plans containing LogicalQueryStage nodes: 1.
Strategy for plans containing LogicalQueryStage nodes: 1. Transforms LogicalQueryStage to its corresponding physical plan that is either being executed or has already completed execution. 2. Transforms Join which has one child relation already planned and executed as a BroadcastQueryStageExec. This is to prevent reversing a broadcast stage into a shuffle stage in case of the larger join child relation finishes before the smaller relation. Note that this rule needs to applied before regular join strategies.
- object ShuffleQueryStageExec extends Serializable
-
object
SimpleCostEvaluator extends CostEvaluator
A simple implementation of CostEvaluator, which counts the number of ShuffleExchangeExec nodes in the plan.