Class QueryToolChest<ResultType,QueryType extends Query<ResultType>>
- java.lang.Object
-
- org.apache.druid.query.QueryToolChest<ResultType,QueryType>
-
- Direct Known Subclasses:
DataSourceQueryQueryToolChest,GroupByQueryQueryToolChest,QueryLogicCompatToolChest,ScanQueryQueryToolChest,SearchQueryQueryToolChest,SegmentMetadataQueryQueryToolChest,TimeBoundaryQueryQueryToolChest,TimeseriesQueryQueryToolChest,TopNQueryQueryToolChest,WindowOperatorQueryQueryToolChest
public abstract class QueryToolChest<ResultType,QueryType extends Query<ResultType>> extends Object
The broker-side (also used by server in some cases) API for a specific Query type.
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedQueryToolChest()
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description <T> booleancanExecuteFully(Query<T> query)booleancanPerformSubquery(Query<?> subquery)Returns whether this toolchest is able to handle the provided subquery.BinaryOperator<ResultType>createMergeFn(Query<ResultType> query)Creates a merge function that is used to merge intermediate aggregates from historicals in broker.Comparator<ResultType>createResultComparator(Query<ResultType> query)Creates an ordering comparator that is used to order results.com.fasterxml.jackson.databind.ObjectMapperdecorateObjectMapper(com.fasterxml.jackson.databind.ObjectMapper objectMapper, QueryType query)Perform any per-query decoration of anObjectMapperthat enables it to read and write objects of the query'sQueryToolChest.<T extends LogicalSegment>
List<T>filterSegments(QueryType query, List<T> segments)This method is called to allow the query to prune segments that it does not believe need to actually be queried.com.fasterxml.jackson.databind.JavaTypegetBaseResultType()com.fasterxml.jackson.databind.JavaTypegetBySegmentResultType()<T> CacheStrategy<ResultType,T,QueryType>getCacheStrategy(QueryType query)Deprecated.UsegetCacheStrategy(Query, ObjectMapper)instead<T> CacheStrategy<ResultType,T,QueryType>getCacheStrategy(QueryType query, com.fasterxml.jackson.databind.ObjectMapper mapper)Returns a CacheStrategy to be used to load data into the cache and remove it from the cache.abstract com.fasterxml.jackson.core.type.TypeReference<ResultType>getResultTypeReference()Returns a TypeReference object that is just passed through to Jackson in order to deserialize the results of this type of query.abstract QueryMetrics<? super QueryType>makeMetrics(QueryType query)Creates aQueryMetricsobject that is used to generate metrics for this specific query type.com.google.common.base.Function<ResultType,ResultType>makePostComputeManipulatorFn(QueryType query, MetricManipulationFn fn)This manipulator functions primary purpose is to conduct finalization of aggregator values.abstract com.google.common.base.Function<ResultType,ResultType>makePreComputeManipulatorFn(QueryType query, MetricManipulationFn fn)Creates a Function that can take in a ResultType and return a new ResultType having applied the MetricManipulatorFn to each of the metrics.QueryRunner<ResultType>mergeResults(QueryRunner<ResultType> runner)This method wraps a QueryRunner.QueryRunner<ResultType>mergeResults(QueryRunner<ResultType> runner, boolean willMergeRunner)LikemergeResults(QueryRunner), but with an additional flag that indicates the type of runner that is passeed to the call.QueryRunner<ResultType>postMergeQueryDecoration(QueryRunner<ResultType> runner)Wraps a QueryRunner.QueryRunner<ResultType>preMergeQueryDecoration(QueryRunner<ResultType> runner)Wraps a QueryRunner.RowSignatureresultArraySignature(QueryType query)Returns aRowSignaturefor the arrays returned byresultsAsArrays(QueryType, org.apache.druid.java.util.common.guava.Sequence<ResultType>).Sequence<Object[]>resultsAsArrays(QueryType query, Sequence<ResultType> resultSequence)Converts a sequence of this query's ResultType into arrays.Optional<Sequence<FrameSignaturePair>>resultsAsFrames(QueryType query, Sequence<ResultType> resultSequence, MemoryAllocatorFactory memoryAllocatorFactory, boolean useNestedForUnknownTypes)Converts a sequence of this query's ResultType into a sequence ofFrameSignaturePair.
-
-
-
Method Detail
-
getBaseResultType
public final com.fasterxml.jackson.databind.JavaType getBaseResultType()
-
getBySegmentResultType
public final com.fasterxml.jackson.databind.JavaType getBySegmentResultType()
-
decorateObjectMapper
public com.fasterxml.jackson.databind.ObjectMapper decorateObjectMapper(com.fasterxml.jackson.databind.ObjectMapper objectMapper, QueryType query)Perform any per-query decoration of anObjectMapperthat enables it to read and write objects of the query'sQueryToolChest. It is used by QueryResource on the write side, and DirectDruidClient on the read side.For most queries, this is a no-op, but it can be useful for query types that support more than one result serialization format. Queries that implement this method must not modify the provided ObjectMapper, but instead must return a copy.
Jackson's default implementation of deserialization is usually optimised and this method should be overriden only if there is a functional requirement of so. The method must be benchmarked in isolation, without other portions of the query engine executing as modifying this method can alter the performance of queries where deserializing is a major portion of the execution.
-
mergeResults
public QueryRunner<ResultType> mergeResults(QueryRunner<ResultType> runner)
This method wraps a QueryRunner. The input QueryRunner, by contract, will provide a series of ResultType objects in time order (ascending or descending). This method should return a new QueryRunner that merges the stream of ordered ResultType objects.A default implementation constructs a
ResultMergeQueryRunnerwhich creates aCombiningSequenceusing the suppliedQueryRunnerwithcreateResultComparator(Query)andcreateMergeFn(Query)} supplied by this toolchest.Generally speaking, the logic that exists in makePostComputeManipulatorFn should actually exist in this method. Additionally, if a query supports PostAggregations, this method should take steps to ensure that it computes PostAggregations a minimum number of times. This is most commonly achieved by computing the PostAgg results during merge and also rewriting the query such that it has the minimum number of PostAggs (most often zero).
- Parameters:
runner- A QueryRunner that provides a series of ResultType objects in time order (ascending or descending)- Returns:
- a QueryRunner that merges the stream of ordered ResultType objects
-
mergeResults
public QueryRunner<ResultType> mergeResults(QueryRunner<ResultType> runner, boolean willMergeRunner)
LikemergeResults(QueryRunner), but with an additional flag that indicates the type of runner that is passeed to the call. willMergeRunner specifies that the input runner to the mergeResults would be the one created by the correspondingQueryRunnerFactory.mergeRunners(java.util.concurrent.ExecutorService, java.lang.Iterable<org.apache.druid.query.QueryRunner<T>>). While it depends on the input runner, it is usually true since most of the time the same server is generating a runner that it wants to merge. The notable deviation from this norm is when the broker is accumulating the results from the data servers and needs to merge them together. In this case willMergeRunner is false. Currently, the sole consumer of this parameter isGroupByQueryQueryToolChest, where it is used to determine if the mergeResults is called withGroupByMergingQueryRunnerto estimate the number of merge buffers required for the query to succeed. It is set false on the brokers, because they (mostly) fetch the results from the historicals, while the data servers set it to false (because they call this method withQueryRunnerFactory.mergeRunners(java.util.concurrent.ExecutorService, java.lang.Iterable<org.apache.druid.query.QueryRunner<T>>). By default, the willMergeRunners is ignored, and themergeResults(QueryRunner)is called. For the toolchests that override this method must ensure thatmergeResults(QueryRunner)delegates to it (else it will use the default implementation formergeResults(QueryRunner)) which would be undesirable.
-
createMergeFn
@Nullable public BinaryOperator<ResultType> createMergeFn(Query<ResultType> query)
Creates a merge function that is used to merge intermediate aggregates from historicals in broker. This merge function is used in the defaultResultMergeQueryRunnerprovided bymergeResults(QueryRunner)and also used inParallelMergeCombiningSequenceby 'CachingClusteredClient' if it does not return null.Returning null from this function means that a query does not support result merging, at least via the mechanisms that utilize this function.
-
createResultComparator
public Comparator<ResultType> createResultComparator(Query<ResultType> query)
Creates an ordering comparator that is used to order results. This comparator is used in the defaultResultMergeQueryRunnerprovided bymergeResults(QueryRunner)
-
makeMetrics
public abstract QueryMetrics<? super QueryType> makeMetrics(QueryType query)
Creates aQueryMetricsobject that is used to generate metrics for this specific query type. This exists to allow for query-specific dimensions and metrics. That is, the ToolChest is expected to set some meaningful dimensions for metrics given this query type. Examples might be the topN threshold for a TopN query or the number of dimensions included for a groupBy query.QueryToolChests for query types in core (druid-processing) and public extensions (belonging to the Druid source tree) should use delegate this method to
GenericQueryMetricsFactory.makeMetrics(Query)on an injected instance ofGenericQueryMetricsFactory, as long as they don't need to emit custom dimensions and/or metrics.If some custom dimensions and/or metrics should be emitted for a query type, a plan described in "Making subinterfaces of QueryMetrics" section in
QueryMetrics's class-level Javadocs should be followed.One way or another, this method should ensure that
QueryMetrics.query(Query)is called with the given query passed on the created QueryMetrics object before returning.- Parameters:
query- The query that is being processed- Returns:
- A QueryMetrics that can be used to make metrics for the provided query
-
makePreComputeManipulatorFn
public abstract com.google.common.base.Function<ResultType,ResultType> makePreComputeManipulatorFn(QueryType query, MetricManipulationFn fn)
Creates a Function that can take in a ResultType and return a new ResultType having applied the MetricManipulatorFn to each of the metrics.This function's primary purpose is to help work around some challenges that exist around deserializing results across the wire. Specifically, different aggregators will generate different object types in a result set, if we wanted jackson to be able to deserialize these directly, we'd need to generate a response class for each query that jackson could use to deserialize things. That is not what we do. Instead, we have jackson deserialize Object instances and then use a MetricManipulatorFn to convert from those object instances to the actual object that the aggregator expects. As such, this would be more effectively named "makeObjectDeserializingFn".
It is safe and acceptable for implementations of this method to first validate that the MetricManipulationFn is
MetricManipulatorFns.DESERIALIZING_INSTANCEand throw an exception if it is not. If such an exception is ever thrown, it is indicative of a bug in the caller which should be fixed by not calling this method with anything other than the deserializing manipulator function.There are some implementations where this was also tasked with computing PostAggregators, but this is actually not a good place to compute those as this function can be called in a number of cases when PostAggs are not really meaningful to compute. Instead, PostAggs should be computed in the mergeResults call and the mergeResults implementation should take care to ensure that PostAggs are only computed the minimum number of times necessary.
This function is called very early in the processing pipeline on the Broker.
- Parameters:
query- The Query that is currently being processedfn- The function that should be applied to all metrics in the results- Returns:
- A function that will apply the provided fn to all metrics in the input ResultType object
-
makePostComputeManipulatorFn
public com.google.common.base.Function<ResultType,ResultType> makePostComputeManipulatorFn(QueryType query, MetricManipulationFn fn)
This manipulator functions primary purpose is to conduct finalization of aggregator values. It would be better named "makeFinalizingManipulatorFn", even that should really be done as part ofmergeResults(org.apache.druid.query.QueryRunner<ResultType>)instead of with this separate method.It is safe and acceptable for implementations of this method to first validate that the MetricManipulationFn is either
MetricManipulatorFns.FINALIZING_INSTANCEorMetricManipulatorFns.IDENTITY_INSTANCEand throw an exception if it is not. If such an exception is ever thrown, it is indicative of a bug in the caller which should be fixed by not calling this method with unsupported manipulator functions.- Parameters:
query- The Query that is currently being processedfn- The function that should be applied to all metrics in the results- Returns:
- A function that will apply the provided fn to all metrics in the input ResultType object
-
getResultTypeReference
public abstract com.fasterxml.jackson.core.type.TypeReference<ResultType> getResultTypeReference()
Returns a TypeReference object that is just passed through to Jackson in order to deserialize the results of this type of query.- Returns:
- A TypeReference to indicate to Jackson what type of data will exist for this query
-
getCacheStrategy
@Deprecated @Nullable public <T> CacheStrategy<ResultType,T,QueryType> getCacheStrategy(QueryType query)
Deprecated.UsegetCacheStrategy(Query, ObjectMapper)insteadLikegetCacheStrategy(Query, ObjectMapper)but the caller doesn't supply the object mapper for deserializing and converting the cached data to desired type. It's upto the individual implementations to decide the appropriate action in that case. It can either throw an exception outright or decide if the query requires the object mapper for proper downstream processing and work with the generic java types if not.
-
getCacheStrategy
@Nullable public <T> CacheStrategy<ResultType,T,QueryType> getCacheStrategy(QueryType query, @Nullable com.fasterxml.jackson.databind.ObjectMapper mapper)
Returns a CacheStrategy to be used to load data into the cache and remove it from the cache.This is optional. If it returns null, caching is effectively disabled for the query.
- Type Parameters:
T- The type of object that will be stored in the cache- Parameters:
query- The query whose results might be cachedmapper- Object mapper to convert the deserialized generic java objects to desired types. It can be nullable to preserve backward compatibility.- Returns:
- A CacheStrategy that can be used to populate and read from the Cache
-
preMergeQueryDecoration
public QueryRunner<ResultType> preMergeQueryDecoration(QueryRunner<ResultType> runner)
Wraps a QueryRunner. The input QueryRunner is the QueryRunner as it exists *before* being passed to mergeResults().In fact, the return value of this method is always passed to mergeResults, so it is equivalent to just implement this functionality as extra decoration on the QueryRunner during mergeResults().
In the interests of potentially simplifying these interfaces, the recommendation is to actually not override this method and instead apply anything that might be needed here in the mergeResults() call.
- Parameters:
runner- The runner to be wrapped- Returns:
- The wrapped runner
-
postMergeQueryDecoration
public QueryRunner<ResultType> postMergeQueryDecoration(QueryRunner<ResultType> runner)
Wraps a QueryRunner. The input QueryRunner is the QueryRunner as it exists coming out of mergeResults()In fact, the input value of this method is always the return value from mergeResults, so it is equivalent to just implement this functionality as extra decoration on the QueryRunner during mergeResults().
In the interests of potentially simplifying these interfaces, the recommendation is to actually not override this method and instead apply anything that might be needed here in the mergeResults() call.
- Parameters:
runner- The runner to be wrapped- Returns:
- The wrapped runner
-
filterSegments
public <T extends LogicalSegment> List<T> filterSegments(QueryType query, List<T> segments)
This method is called to allow the query to prune segments that it does not believe need to actually be queried. It can use whatever criteria it wants in order to do the pruning, it just needs to return the list of Segments it actually wants to see queried.- Type Parameters:
T- A Generic parameter because Java is cool- Parameters:
query- The query being processedsegments- The list of candidate segments to be queried- Returns:
- The list of segments to actually query
-
canPerformSubquery
public boolean canPerformSubquery(Query<?> subquery)
Returns whether this toolchest is able to handle the provided subquery.When this method returns true, the core query stack will pass subquery datasources over to the toolchest and will assume they are properly handled.
When this method returns false, the core query stack will throw an error if subqueries are present. In the future, instead of throwing an error, the core query stack will handle the subqueries on its own.
-
resultArraySignature
public RowSignature resultArraySignature(QueryType query)
Returns aRowSignaturefor the arrays returned byresultsAsArrays(QueryType, org.apache.druid.java.util.common.guava.Sequence<ResultType>). The returned signature will be the same length as each array returned byresultsAsArrays(QueryType, org.apache.druid.java.util.common.guava.Sequence<ResultType>).- Parameters:
query- same query passed toresultsAsArrays(QueryType, org.apache.druid.java.util.common.guava.Sequence<ResultType>)- Returns:
- row signature
- Throws:
UnsupportedOperationException- if this query type does not support returning results as arrays
-
resultsAsArrays
public Sequence<Object[]> resultsAsArrays(QueryType query, Sequence<ResultType> resultSequence)
Converts a sequence of this query's ResultType into arrays. The array signature is given byresultArraySignature(QueryType). This functionality is useful because it allows higher-level processors to operate on the results of any query in a consistent way. This is useful for the SQL layer and for any algorithm that might operate on the results of an inner query.Not all query types support this method. They will throw
UnsupportedOperationException, and they cannot be used by the SQL layer or by generic higher-level algorithms.Some query types return less information after translating their results into arrays, especially in situations where there is no clear way to translate fully rich results into flat arrays. For example, the scan query does not include the segmentId in its array-based results, because it could potentially conflict with a 'segmentId' field in the actual datasource being scanned.
It is possible that there will be multiple arrays returned for a single result object. For example, in the topN query, each
TopNResultValuewill generate a separate array for each of itsvalues.By convention, the array form should include the __time column, if present, as a long (milliseconds since epoch).
- Parameters:
resultSequence- results of the form returned bymergeResults(org.apache.druid.query.QueryRunner<ResultType>)- Returns:
- results in array form
- Throws:
UnsupportedOperationException- if this query type does not support returning results as arrays
-
resultsAsFrames
public Optional<Sequence<FrameSignaturePair>> resultsAsFrames(QueryType query, Sequence<ResultType> resultSequence, MemoryAllocatorFactory memoryAllocatorFactory, boolean useNestedForUnknownTypes)
Converts a sequence of this query's ResultType into a sequence ofFrameSignaturePair. The array signature is the one give byresultArraySignature(Query). If the toolchest doesn't support this method, then it can return an empty optional. It is the duty of the callees to throw an appropriate exception in that case or use an alternative fallback approachCheck documentation of
resultsAsArrays(Query, Sequence)as the behaviour of the rows represented by the frame sequence is identical.Each Frame has a separate
RowSignaturebecause for some query types like the Scan query, every column in the final result might not be present in the individual ResultType (and subsequently Frame). Therefore, this is done to preserve the space by not populating the column in that particular Frame and omitting it from its signature- Parameters:
query- Query being executed by the toolchest. Used to determine the rowSignature of the FramesresultSequence- results of the form returned bymergeResults(QueryRunner)memoryAllocatorFactory-useNestedForUnknownTypes- true if the unknown types in the results can be serded using complex types
-
canExecuteFully
public <T> boolean canExecuteFully(Query<T> query)
-
-