Interface DataSource
-
- All Known Implementing Classes:
FilteredDataSource,FrameBasedInlineDataSource,GlobalTableDataSource,InlineDataSource,JoinDataSource,LookupDataSource,QueryDataSource,RestrictedDataSource,TableDataSource,UnionDataSource,UnnestDataSource
public interface DataSourceRepresents a source... of data... for a query. Analogous to the "FROM" clause in SQL.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Default Methods Modifier and Type Method Description Function<SegmentReference,SegmentReference>createSegmentMapFunction(Query query, AtomicLong cpuTimeAcc)Returns a segment function on to how to segment should be modified.DataSourceAnalysisgetAnalysis()Get the analysis for a data sourcebyte[]getCacheKey()Compute a cache key prefix for a data source.List<DataSource>getChildren()Returns datasources that this datasource depends on.Set<String>getTableNames()Returns the names of all table datasources involved in this query.booleanisCacheable(boolean isBroker)Returns true if queries on this dataSource are cacheable at both the result level and per-segment level.booleanisConcrete()Returns true if this datasource can be the base datasource of query processing.booleanisGlobal()Returns true if all servers have a full copy of this datasource.DataSourcewithChildren(List<DataSource> children)Return a new DataSource, identical to this one, with different children.default DataSourcewithPolicies(Map<String,Optional<Policy>> policyMap)Returns the query with an updated datasource based on the policy restrictions on tables.DataSourcewithUpdatedDataSource(DataSource newSource)Returns an updated datasource based on the specified new source.
-
-
-
Method Detail
-
getTableNames
Set<String> getTableNames()
Returns the names of all table datasources involved in this query. Does not include names for non-tables, like lookups or inline datasources.
-
getChildren
List<DataSource> getChildren()
Returns datasources that this datasource depends on. Will be empty for leaf datasources like 'table'.
-
withChildren
DataSource withChildren(List<DataSource> children)
Return a new DataSource, identical to this one, with different children. The number of children must be equal to the number of children that this datasource already has.
-
isCacheable
boolean isCacheable(boolean isBroker)
Returns true if queries on this dataSource are cacheable at both the result level and per-segment level. Currently, dataSources that do not actually reference segments (like 'inline'), are not cacheable since cache keys are always based on segment identifiers.
-
isGlobal
boolean isGlobal()
Returns true if all servers have a full copy of this datasource. True for things like inline, lookup, etc, or for queries of those.Currently this is coupled with joinability - if this returns true then the query engine expects there exists a
JoinableFactorywhich might build aJoinablefor this datasource directly. If a subquery 'inline' join is required to join this datasource on the right hand side, then this value must be false for now.In the future, instead of directly using this method, the query planner and engine should consider
JoinableFactory.isDirectlyJoinable(DataSource)when determining if the right hand side is directly joinable, which would allow decoupling this property from joins.
-
isConcrete
boolean isConcrete()
Returns true if this datasource can be the base datasource of query processing.Base datasources drive query processing. If the base datasource is
TableDataSource, for example, queries are processed in parallel on data servers. If the base datasource isInlineDataSource, queries are processed on the Broker. SeeDataSourceAnalysis.getBaseDataSource()for further discussion.Datasources that are *not* concrete must be pre-processed in some way before they can be processed by the main query stack. For example,
QueryDataSourcemust be executed first and substituted with its results.- See Also:
which uses this,which uses this
-
createSegmentMapFunction
Function<SegmentReference,SegmentReference> createSegmentMapFunction(Query query, AtomicLong cpuTimeAcc)
Returns a segment function on to how to segment should be modified.- Parameters:
query- the input querycpuTimeAcc- the cpu time accumulator- Returns:
- the segment function
-
withUpdatedDataSource
DataSource withUpdatedDataSource(DataSource newSource)
Returns an updated datasource based on the specified new source.- Parameters:
newSource- the new datasource to be used to update an existing query- Returns:
- the updated datasource to be used
-
withPolicies
default DataSource withPolicies(Map<String,Optional<Policy>> policyMap)
Returns the query with an updated datasource based on the policy restrictions on tables.If this datasource contains no table, no changes should occur.
- Parameters:
policyMap- a mapping of table names to policy restrictions. A missing key is different from an empty value:- a missing key means the table has never been permission checked.
- an empty value indicates the table doesn't have any policy restrictions, it has been permission checked.
- Returns:
- the updated datasource, with restrictions applied in the datasource tree
- Throws:
IllegalStateException- when mapping a RestrictedDataSource, unless the table has a NoRestrictionPolicy in the policyMap (used by druid-internal). Missing policy or adding a non-NoRestrictionPolicy to RestrictedDataSource would throw.
-
getCacheKey
byte[] getCacheKey()
Compute a cache key prefix for a data source. This includes the data sources that participate in the RHS of a join as well as any query specific constructs associated with join data source such as base table filter. This key prefix can be used in segment level cache or result level cache. The function can return following - Non-empty byte array - If there is join datasource involved and caching is possible. The result includes join condition expression, join type and cache key returned by joinable factory for eachPreJoinableClause- NULL - There is a join but caching is not possible. It may happen if one of the participating datasource in the JOIN is not cacheable.- Returns:
- the cache key to be used as part of query cache key
-
getAnalysis
DataSourceAnalysis getAnalysis()
Get the analysis for a data source- Returns:
- The
DataSourceAnalysisobject for the callee data source
-
-