Package org.apache.druid.query
Interface DataSource
-
- All Known Implementing Classes:
FilteredDataSource
,FrameBasedInlineDataSource
,GlobalTableDataSource
,InlineDataSource
,JoinDataSource
,LookupDataSource
,QueryDataSource
,TableDataSource
,UnionDataSource
,UnnestDataSource
public interface DataSource
Represents a source... of data... for a query. Analogous to the "FROM" clause in SQL.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description Function<SegmentReference,SegmentReference>
createSegmentMapFunction(Query query, AtomicLong cpuTimeAcc)
Returns a segment function on to how to segment should be modified.DataSourceAnalysis
getAnalysis()
Get the analysis for a data sourcebyte[]
getCacheKey()
Compute a cache key prefix for a data source.List<DataSource>
getChildren()
Returns datasources that this datasource depends on.Set<String>
getTableNames()
Returns the names of all table datasources involved in this query.boolean
isCacheable(boolean isBroker)
Returns true if queries on this dataSource are cacheable at both the result level and per-segment level.boolean
isConcrete()
Returns true if this datasource can be the base datasource of query processing.boolean
isGlobal()
Returns true if all servers have a full copy of this datasource.DataSource
withChildren(List<DataSource> children)
Return a new DataSource, identical to this one, with different children.DataSource
withUpdatedDataSource(DataSource newSource)
Returns an updated datasource based on the specified new source.
-
-
-
Method Detail
-
getTableNames
Set<String> getTableNames()
Returns the names of all table datasources involved in this query. Does not include names for non-tables, like lookups or inline datasources.
-
getChildren
List<DataSource> getChildren()
Returns datasources that this datasource depends on. Will be empty for leaf datasources like 'table'.
-
withChildren
DataSource withChildren(List<DataSource> children)
Return a new DataSource, identical to this one, with different children. The number of children must be equal to the number of children that this datasource already has.
-
isCacheable
boolean isCacheable(boolean isBroker)
Returns true if queries on this dataSource are cacheable at both the result level and per-segment level. Currently, dataSources that do not actually reference segments (like 'inline'), are not cacheable since cache keys are always based on segment identifiers.
-
isGlobal
boolean isGlobal()
Returns true if all servers have a full copy of this datasource. True for things like inline, lookup, etc, or for queries of those.Currently this is coupled with joinability - if this returns true then the query engine expects there exists a
JoinableFactory
which might build aJoinable
for this datasource directly. If a subquery 'inline' join is required to join this datasource on the right hand side, then this value must be false for now.In the future, instead of directly using this method, the query planner and engine should consider
JoinableFactory.isDirectlyJoinable(DataSource)
when determining if the right hand side is directly joinable, which would allow decoupling this property from joins.
-
isConcrete
boolean isConcrete()
Returns true if this datasource can be the base datasource of query processing. Base datasources drive query processing. If the base datasource isTableDataSource
, for example, queries are processed in parallel on data servers. If the base datasource isInlineDataSource
, queries are processed on the Broker. SeeDataSourceAnalysis.getBaseDataSource()
for further discussion. Datasources that are *not* concrete must be pre-processed in some way before they can be processed by the main query stack. For example,QueryDataSource
must be executed first and substituted with its results.- See Also:
which uses this
,which uses this
-
createSegmentMapFunction
Function<SegmentReference,SegmentReference> createSegmentMapFunction(Query query, AtomicLong cpuTimeAcc)
Returns a segment function on to how to segment should be modified.- Parameters:
query
- the input querycpuTimeAcc
- the cpu time accumulator- Returns:
- the segment function
-
withUpdatedDataSource
DataSource withUpdatedDataSource(DataSource newSource)
Returns an updated datasource based on the specified new source.- Parameters:
newSource
- the new datasource to be used to update an existing query- Returns:
- the updated datasource to be used
-
getCacheKey
byte[] getCacheKey()
Compute a cache key prefix for a data source. This includes the data sources that participate in the RHS of a join as well as any query specific constructs associated with join data source such as base table filter. This key prefix can be used in segment level cache or result level cache. The function can return following - Non-empty byte array - If there is join datasource involved and caching is possible. The result includes join condition expression, join type and cache key returned by joinable factory for eachPreJoinableClause
- NULL - There is a join but caching is not possible. It may happen if one of the participating datasource in the JOIN is not cacheable.- Returns:
- the cache key to be used as part of query cache key
-
getAnalysis
DataSourceAnalysis getAnalysis()
Get the analysis for a data source- Returns:
- The
DataSourceAnalysis
object for the callee data source
-
-