Packages

p

org.apache.spark.sql.execution

dynamicpruning

package dynamicpruning

Ordering
  1. Alphabetic
Visibility
  1. Public
  2. All

Type Members

  1. case class PlanDynamicPruningFilters(sparkSession: SparkSession) extends Rule[SparkPlan] with Product with Serializable

    This planner rule aims at rewriting dynamic pruning predicates in order to reuse the results of broadcast.

    This planner rule aims at rewriting dynamic pruning predicates in order to reuse the results of broadcast. For joins that are not planned as broadcast hash joins we keep the fallback mechanism with subquery duplicate.

  2. class RowLevelOperationRuntimeGroupFiltering extends Rule[LogicalPlan] with PredicateHelper

    A rule that assigns a subquery to filter groups in row-level operations at runtime.

    A rule that assigns a subquery to filter groups in row-level operations at runtime.

    Data skipping during job planning for row-level operations is limited to expressions that can be converted to data source filters. Since not all expressions can be pushed down that way and rewriting groups is expensive, Spark allows data sources to filter group at runtime. If the primary scan in a group-based row-level operation supports runtime filtering, this rule will inject a subquery to find all rows that match the condition so that data sources know exactly which groups must be rewritten.

    Note this rule only applies to group-based row-level operations.

Value Members

  1. object CleanupDynamicPruningFilters extends Rule[LogicalPlan] with PredicateHelper

    Removes the filter nodes with dynamic pruning that were not pushed down to the scan.

    Removes the filter nodes with dynamic pruning that were not pushed down to the scan. These nodes will not be pushed through projects and aggregates with non-deterministic expressions.

  2. object PartitionPruning extends Rule[LogicalPlan] with PredicateHelper with JoinSelectionHelper

    Dynamic partition pruning optimization is performed based on the type and selectivity of the join operation.

    Dynamic partition pruning optimization is performed based on the type and selectivity of the join operation. During query optimization, we insert a predicate on the filterable table using the filter from the other side of the join and a custom wrapper called DynamicPruning.

    The basic mechanism for DPP inserts a duplicated subquery with the filter from the other side, when the following conditions are met: (1) the table to prune is filterable by the JOIN key (2) the join operation is one of the following types: INNER, LEFT SEMI, LEFT OUTER (partitioned on right), or RIGHT OUTER (partitioned on left)

    In order to enable partition pruning directly in broadcasts, we use a custom DynamicPruning clause that incorporates the In clause with the subquery and the benefit estimation. During query planning, when the join type is known, we use the following mechanism: (1) if the join is a broadcast hash join, we replace the duplicated subquery with the reused results of the broadcast, (2) else if the estimated benefit of partition pruning outweighs the overhead of running the subquery query twice, we keep the duplicated subquery (3) otherwise, we drop the subquery.

Ungrouped