Class DataSourceAnalysis


  • public class DataSourceAnalysis
    extends Object
    Analysis of a datasource for purposes of deciding how to execute a particular query. The analysis breaks a datasource down in the following way:
    
                                 Q  <-- Possible query datasource(s) [may be none, or multiple stacked]
                                 |
                                 Q  <-- Base query datasource, returned by getBaseQuery() if it exists
                                 |
                                 J  <-- Possible join tree, expected to be left-leaning
                                / \
                               J  Dj <--  Other leaf datasources
       Base datasource        / \         which will be joined
      (bottom-leftmost) -->  Db Dj  <---- into the base datasource
    
     
    The base datasource (Db) is returned by getBaseDataSource(). The other leaf datasources are returned by getPreJoinableClauses(). The base datasource (Db) will never be a join, but it can be any other type of datasource (table, query, etc). Note that join trees are only flattened if they occur at the top of the overall tree (or underneath an outer query), and that join trees are only flattened to the degree that they are left-leaning. Due to these facts, it is possible for the base or leaf datasources to include additional joins. The base datasource is the one that will be considered by the core Druid query stack for scanning via Segment and StorageAdapter. The other leaf datasources must be joinable onto the base data. The idea here is to keep things simple and dumb. So we focus only on identifying left-leaning join trees, which map neatly onto a series of hash table lookups at query time. The user/system generating the queries, e.g. the druid-sql layer (or the end user in the case of native queries), is responsible for containing the smarts to structure the tree in a way that will lead to optimal execution.
    • Method Detail

      • getBaseDataSource

        public DataSource getBaseDataSource()
        Returns the base (bottom-leftmost) datasource.
      • getBaseQuery

        public Optional<Query<?>> getBaseQuery()
        Returns the bottom-most (i.e. innermost) Query from a possible stack of outer queries at the root of the datasource tree. This is the query that will be applied to the base datasource plus any joinables that might be present.
        Returns:
        the query associated with the base datasource if is true, else empty
      • getJoinBaseTableFilter

        public Optional<DimFilter> getJoinBaseTableFilter()
        If the original data source is a join data source and there is a DimFilter on the base table data source, that DimFilter is returned here
      • getBaseQuerySegmentSpec

        public Optional<QuerySegmentSpec> getBaseQuerySegmentSpec()
        Returns the QuerySegmentSpec that is associated with the base datasource, if any. This only happens when there is an outer query datasource. In this case, the base querySegmentSpec is the one associated with the innermost subquery.

        This QuerySegmentSpec is taken from the query returned by getBaseQuery().

        Returns:
        the query segment spec associated with the base datasource if is true, else empty
      • maybeWithBaseQuery

        public DataSourceAnalysis maybeWithBaseQuery​(Query<?> query)
        Returns the data source analysis with or without the updated query. If the DataSourceAnalysis already has a non-null baseQuery, no update is required Else this method creates a new analysis object with the base query provided in the input
        Parameters:
        query - the query to add to the analysis if the baseQuery is null
        Returns:
        the existing analysis if it has non-null basequery, else a new one with the updated base query
      • getPreJoinableClauses

        public List<PreJoinableClause> getPreJoinableClauses()
        Returns join clauses corresponding to joinable leaf datasources (every leaf except the bottom-leftmost).
      • isConcreteBased

        public boolean isConcreteBased()
        Returns true if this datasource can be computed by the core Druid query stack via a scan of a concrete base datasource. All other datasources involved, if any, must be global.
      • isConcreteAndTableBased

        public boolean isConcreteAndTableBased()
        Returns true if this datasource is both (see isConcreteBased() and isTableBased(). This is an important property, because it corresponds to datasources that can be handled by Druid's distributed query stack.
      • isJoin

        public boolean isJoin()
        Returns true if this datasource is made out of a join operation
      • hashCode

        public int hashCode()
        Overrides:
        hashCode in class Object