Class MatchableSortExpression

  • All Implemented Interfaces:
    Bindable, Correlated<RelationalExpression>, InternalPlannerGraphRewritable, RelationalExpressionWithChildren, RelationalExpression

    @API(EXPERIMENTAL)
    public class MatchableSortExpression
    extends Object
    implements RelationalExpressionWithChildren, InternalPlannerGraphRewritable
    A relational planner expression that represents an unimplemented sort on the records produced by its inner relational planner expression. TODO BEGIN This class is somewhat flawed. There is a cousin of this class LogicalSortExpression which serves a similar purpose conceptually but is still technically quite different. LogicalSortExpression serves as the logical precursor to a physical implementation of a sort (which we do not (yet) have). So in order for the planner to produce an executable plan, that sort expression has to be optimized away by e.g. RemoveSortRule. This class is different in a way that it an only be used as an expression to be matched against. In other words this class is only ever allowed to appear in a match candidate on the expression side. The reason why that is comes down to the simple question of how to express the thing(s) to sort on in general and what to do with nested repeated value in particular. What should we be able to express when we express order? One thing we definitely want to sort by is a scalar value that can be computed from the record or an index key that flows into the operator. On the contrary we should not be able to sort by something like field("a", FanOut) as it really does not make sense in this context. What does make sense is to say that you want the stream to be ordered by field("a", FanOut), but you actually are expressing a requirement on data computed on a record before it even flows into the sort, i.e. it cannot be computed (conceptually) in this operator. The nested repeated field a needs to come from either a physical version of an ExplodeExpression (which we don't have (yet)) or from an index access as index keys can be constructed in this way. The expressiveness that is fundamentally lacking here is to model what data flows, how it flows and what operations have been applied to it since it was generated. There are other places where exactly this is a problem as well. Specifically, for the sort expression, however, this problem manifests itself as: (1) Either the sort is expressed by using KeyExpressions. In that case, expressing a nested repeated value in the sort expression itself does not make sense. Again, we should model such a case as a sort over an value that has been pre-produced by a cardinality-changing operator such as a physical variant of explode or by explicitly referring to and modelling the index keys that can flow along with the fetched record. Note that this of course has far-reaching implications for optimizations that attempt to defer the fetch of a record. Essentially, we would like to say
         index: field(a, FanOut)
         plan: Sort(IndexScan(index), a [5, 10]), indexkey.a)
         
    i.e. we want to sort by the index key. We do not want to model that plan like
         plan: Sort(IndexScan(index), a [5, 10]), record.field("a", FanOut))
         
    where the sort expression (even only if conceptually) extracts the as from the base records as the base record at that moment already may contain duplicates of the base record due to using that index in the underlying scan. (2) Alternatively, we can express the sort by using explicit index scan-provided values. You simply cannot sort by anything that does not come from an index or a primary scan (the latter cannot really sort by anything that's repeated due to the key being a primary key). That works for the current landscape as there is no physical sort (yet) and it overcomes the problem explained above in a way that we can express to sort by anything the index scan (or primary scan) produces as opposed to something that can be computed on-the-fly based upon the base record. This also removes the requirement of the base record to be fetched and present at the the time the sort happens (conceptually) and allows for deferred-fetch optimizations. As a direct result of this we now have two different logical sort expressions. One, LogicalSortExpression which expresses order by using KeyExpressions and which has the problems layed out in (1), and a this one MatchableSortExpression which expresses order by explicitly naming the constituent parts of an index. In the future, we should strive to unify these two classes to one logical sort expression. For now, we have a logical sort expression (LogicalSortExpression) on the query side and a matchable sort expression (MatchableSortExpression) on the match candidate side. TODO END When it comes to matching this class does not (yet) subsume a LogicalSortExpression although it probably should be able to just for orthogonality of how rules can be applied and the search space for transformations and matching is explored. As an expression of this class never transforms into a physical sort (even if we had one, this expression would not be transformed into a a physical operator). In a sense this expression expresses a property of the flowing data stream (which ultimately comes from an ordered scan). This property is picked up by AdjustMatchRule meaning that this operator can match "in-between" expressions on the query expression side (homomorphic matching) by improving the existing match memo by adorning it using the correct order-by property.