Class CascadesPlanner

  • All Implemented Interfaces:
    QueryPlanner

    @API(EXPERIMENTAL)
    public class CascadesPlanner
    extends Object
    implements QueryPlanner
    A Cascades-style query planner that converts a RecordQuery to a RecordQueryPlan, possibly using secondary indexes defined in a RecordMetaData to execute the query efficiently.

    Cascades is a framework for a query optimization introduced by Graefe in 1995. In Cascades, all parsed queries, query plans, and intermediate state between the two are represented in a unified tree of RelationalExpression, which includes types such as RecordQueryPlan and QueryComponent. This highly flexible data structure reifies essentially the entire state of the planner (i.e., partially planned elements, current optimization, goals, etc.) and allows individual planning steps to be modular and stateless by keeping all state in the RelationalExpression tree.

    Like many optimization frameworks, Cascades is driven by a set of PlannerRules, each of which describes a particular transformation and encapsulates the logic for determining its applicability and applying it. The planner searches through its PlannerRuleSet to find a matching rule and then executes that rule, creating zero or more additional PlannerExpressions. A rule is defined by:

    • An ExpressionMatcher that defines a finite-depth tree of matchers that inspect the structure (i.e., the type-level information) of some subgraph of the current planner expression.
    • A PlannerRule.onMatch(PlannerRuleCall) method that is run for each successful match, producing zero or more new expressions.

    Since rules can be applied speculatively and need not be "reductive" in any reasonable sense, it is common for cyclic rule application to occur. Furthermore, the number of possible expression trees being considered at any time can be enormous, since every rule might apply to many of the existing trees under consideration by the planner. To mitigate this, Cascades uses aggressive memoization, which is represented by the memo data structure. The memo provides an efficient interface for storing a forest of expressions, where there might be substantial overlap between different trees in the forest. The memo is composed of expression groups (or just groups), which are equivalence classes of expressions. In this implementation, the memo structure is an implicit data structure represented by GroupExpressionRefs, each of which represents a group expression in Cascades and contains a set of RelationalExpressions. In turn, RelationalExpressions have some number of children, each of which is a GroupExpressionRef and which can be traversed by the planner via the RelationalExpression.getQuantifiers() method.

    A Cascades planner operates by repeatedly executing a CascadesPlanner.Task from the task execution stack (in this case), which performs some actions and may schedule other tasks by pushing them onto the stack. The tasks in this particular planner are the implementors of the CascadesPlanner.Task interface.

    Since a Cascades-style planner produces many possible query plans, it needs some way to decide which ones to select. This is generally done with a cost model that scores plans according to some cost metric. For now, we use the CascadesCostModel which is a heuristic model implemented as a Comparator.

    See Also:
    GroupExpressionRef, RelationalExpression, PlannerRule, CascadesCostModel