Interface QueryMetrics<QueryType extends Query<?>>

  • Type Parameters:
    QueryType -
    All Known Subinterfaces:
    GroupByQueryMetrics, SearchQueryMetrics, TimeseriesQueryMetrics, TopNQueryMetrics
    All Known Implementing Classes:
    DefaultGroupByQueryMetrics, DefaultQueryMetrics, DefaultSearchQueryMetrics, DefaultTimeseriesQueryMetrics, DefaultTopNQueryMetrics

    public interface QueryMetrics<QueryType extends Query<?>>
    Abstraction wrapping ServiceMetricEvent.Builder and allowing to control what metrics are actually emitted, what dimensions do they have, etc. Goals of QueryMetrics --------------------- 1. Skipping or partial filtering of particular dimensions or metrics entirely. Implementation could leave the body of the corresponding method empty, or implement random filtering like: public void reportCpuTime(long timeNs) { if (ThreadLocalRandom.current().nextDouble() < 0.1) { super.reportCpuTime(timeNs); } } 2. Ability to add new dimensions and metrics, possibly expensive to compute, or expensive to process (long string values, high cardinality, etc.) and not to affect existing Druid installations, by skipping (see 1.) those dimensions and metrics entirely in the default QueryMetrics implementations. Users who need those expensive dimensions and metrics, could explicitly emit them in their own QueryMetrics. 3. Control over the time unit, in which time metrics are emitted. By default (see DefaultQueryMetrics and it's subclasses) it's milliseconds, but if queries are fast, it could be not precise enough. 4. Control over the dimension and metric names. Here, "control" is provided to the operator of a Druid cluster, who would exercise that control through a site-specific extension adding XxxQueryMetricsFactory impl(s). Types of methods in this interface ---------------------------------- 1. Methods, pulling some dimensions from the query object. These methods are used to populate the metric before the query is run. These methods accept a single `QueryType query` parameter. query(Query) calls all methods of this type, hence pulling all available information from the query object as dimensions. 2. Methods for setting dimensions, which become known in the process of the query execution or after the query is completed. 3. Methods to register metrics to be emitted later in bulk via emit(ServiceEmitter). These methods return this QueryMetrics object back for chaining. Names of these methods start with "report" prefix. Implementors expectations ------------------------- QueryMetrics is expected to be changed often, in every Druid release (including "patch" releases). Users who create their custom implementations of QueryMetrics should be ready to fix the code of their QueryMetrics (implement new methods) when they update Druid. Broken builds of custom extensions, containing custom QueryMetrics is the way to notify users that Druid core "wants" to emit new dimension or metric, and the user handles them manually: if the new dimension or metric is useful and not very expensive to process and store then emit, skip (see above Goals, 1.) otherwise.

    Despite this interface is annotated as ExtensionPoint and some of it's methods as PublicApi, it may be changed in breaking ways even in minor releases.

    If implementors of custom QueryMetrics don't want to fix builds on every Druid release (e. g. if they want to add a single dimension to emitted events and don't want to alter other dimensions and emitted metrics), they could inherit their custom QueryMetrics from DefaultQueryMetrics or query-specific default implementation class, such as DefaultTopNQueryMetrics. Those classes are guaranteed to stay around and implement new methods, added to the QueryMetrics interface (or a query-specific subinterface). However, there is no 100% guarantee of compatibility, because methods could not only be added to QueryMetrics, existing methods could also be changed or removed.

    QueryMetrics is designed for use from a single thread, implementations shouldn't care about thread-safety. Adding new methods to QueryMetrics ---------------------------------- 1. When adding a new method for setting a dimension, which could be pulled from the query object, always make them accept a single `QueryType query` parameter, letting the implementations to do all the work of carving the dimension value out of the query object. 2. When adding a new method for setting a dimension, which becomes known in the process of the query execution or after the query is completed, design it so that as little work as possible is done for preparing arguments for this method, and as much work as possible is done in the implementations of this method, if they decide to actually emit this dimension. 3. When adding a new method for registering metrics, make it to accept the metric value in the smallest reasonable unit (i. e. nanoseconds for time metrics, bytes for metrics of data size, etc.), allowing the implementations of this method to round the value up to more coarse-grained units, if they don't need the maximum precision. Making subinterfaces of QueryMetrics for emitting custom dimensions and/or metrics for specific query types ----------------------------------------------------------------------------------------------------------- If a query type (e. g. SegmentMetadataQuery (it's runners) needs to emit custom dimensions and/or metrics which doesn't make sense for all other query types, the following steps should be executed: 1. Create `interface SegmentMetadataQueryMetrics extends QueryMetrics` (here and below "SegmentMetadata" is the query type) with additional methods (see "Adding new methods" section above). 2. Create `class DefaultSegmentMetadataQueryMetrics implements SegmentMetadataQueryMetrics`. This class should implement extra methods from SegmentMetadataQueryMetrics interfaces with empty bodies, AND DELEGATE ALL OTHER METHODS TO A QueryMetrics OBJECT, provided as a sole parameter in DefaultSegmentMetadataQueryMetrics constructor. NOTE: query(), dataSource(), queryType(), interval(), hasFilters(), duration(), queryId(), sqlQueryId(), and context() methods or any "pre-query-execution-time" methods should either have a empty body or throw exception. 3. Create `interface SegmentMetadataQueryMetricsFactory` with a single method `SegmentMetadataQueryMetrics makeMetrics(SegmentMetadataQuery query);`. 4. Create `class DefaultSegmentMetadataQueryMetricsFactory implements SegmentMetadataQueryMetricsFactory`, which accepts GenericQueryMetricsFactory as injected constructor parameter, and implements makeMetrics() as `return new DefaultSegmentMetadataQueryMetrics(genericQueryMetricsFactory.makeMetrics(query));` 5. Inject and use SegmentMetadataQueryMetricsFactory instead of GenericQueryMetricsFactory in SegmentMetadataQueryQueryToolChest. 6. Establish injection of SegmentMetadataQueryMetricsFactory using config and provider method in QueryToolChestModule (see how it is done in QueryToolChestModule) for existing query types with custom metrics, e. g. SearchQueryMetricsFactory), if the query type belongs to the core druid-processing, e. g. SegmentMetadataQuery. If the query type defined in an extension, you can specify `binder.bind(ScanQueryMetricsFactory.class).to(DefaultScanQueryMetricsFactory.class)` in the extension's Guice module, if the query type is defined in an extension, e. g. ScanQuery. Or establish similar configuration, as for the core query types. This complex procedure is needed to ensure custom GenericQueryMetricsFactory specified by users still works for the query type when query type decides to create their custom QueryMetrics subclass. TopNQueryMetrics, GroupByQueryMetrics, and TimeseriesQueryMetrics are implemented differently, because they are introduced at the same time as the whole QueryMetrics abstraction and their default implementations have to actually emit more dimensions than the default generic QueryMetrics. So those subinterfaces shouldn't be taken as direct examples for following the plan specified above. Refer SearchQueryMetricsFactory as an implementation example of this procedure.

    • Method Detail

      • query

        void query​(QueryType query)
        Pulls all information from the query object into dimensions of future metrics.
      • queryId

        void queryId​(String queryId)
        Sets id of the given query as dimension.
      • sqlQueryId

        void sqlQueryId​(String sqlQueryId)
        Sets sqlQueryId as a dimension
      • server

        void server​(String host)
      • remoteAddress

        void remoteAddress​(String remoteAddress)
      • status

        void status​(String status)
      • success

        void success​(boolean success)
      • segment

        void segment​(String segmentIdentifier)
      • preFilters

        void preFilters​(List<Filter> preFilters)
      • postFilters

        void postFilters​(List<Filter> postFilters)
      • identity

        void identity​(String identity)
        Sets identity of the requester for a query. See AuthenticationResult.
      • vectorized

        void vectorized​(boolean vectorized)
        Sets whether or not a segment scan has been vectorized. Generally expected to only be attached to segment-level metrics, since at whole-query level we might have a mix of vectorized and non-vectorized segment scans.
      • parallelMergeParallelism

        void parallelMergeParallelism​(int parallelism)
        Sets broker merge parallelism, if parallel merges are enabled. This will only appear in broker level metrics. This value is identical to the reportParallelMergeParallelism(int) metric value, but optionally also available as a dimension.
      • reportQueryTime

        QueryMetrics<QueryType> reportQueryTime​(long timeNs)
        Registers "query time" metric. Measures the time between a Jetty thread starting to handle a query, and the response being fully written to the response output stream. Does not include time spent waiting in a queue before the query runs.
      • reportQueryBytes

        QueryMetrics<QueryType> reportQueryBytes​(long byteCount)
        Registers "query bytes" metric. Measures the total number of bytes written by the query server thread to the response output stream. Emitted once per query.
      • reportQueriedSegmentCount

        QueryMetrics<QueryType> reportQueriedSegmentCount​(long segmentCount)
        Registers "segments queried count" metric.
      • reportWaitTime

        QueryMetrics<QueryType> reportWaitTime​(long timeNs)
        Registers "wait time" metric. Measures the total time segment-processing runnables spent waiting for execution in the processing thread pool. Emitted once per segment.
      • reportSegmentTime

        QueryMetrics<QueryType> reportSegmentTime​(long timeNs)
        Registers "segment time" metric. Measures the total wall-clock time spent operating on segments in processing threads. Emitted once per segment.
      • reportSegmentAndCacheTime

        QueryMetrics<QueryType> reportSegmentAndCacheTime​(long timeNs)
        Registers "segmentAndCache time" metric. Measures the total wall-clock time spent in processing threads, either operating on segments or retrieving items from cache. Emitted once per segment.
      • reportNodeTimeToFirstByte

        QueryMetrics<QueryType> reportNodeTimeToFirstByte​(long timeNs)
        Registers "time to first byte" metric.
      • reportBackPressureTime

        QueryMetrics<QueryType> reportBackPressureTime​(long timeNs)
        Registers "time that channel is unreadable (backpressure)" metric.
      • reportNodeBytes

        QueryMetrics<QueryType> reportNodeBytes​(long byteCount)
        Registers "node bytes" metric.
      • reportBitmapConstructionTime

        QueryMetrics<QueryType> reportBitmapConstructionTime​(long timeNs)
        Reports the time spent constructing bitmap from preFilters(List) of the query. Not reported, if there are no preFilters.
      • reportSegmentRows

        QueryMetrics<QueryType> reportSegmentRows​(long numRows)
        Reports the total number of rows in the processed segment.
      • reportParallelMergeParallelism

        QueryMetrics<QueryType> reportParallelMergeParallelism​(int parallelism)
        Reports number of parallel tasks the broker used to process the query during parallel merge. This value is identical to the parallelMergeParallelism(int) dimension value, but optionally also available as a metric.
      • reportParallelMergeInputSequences

        QueryMetrics<QueryType> reportParallelMergeInputSequences​(long numSequences)
        Reports total number of input sequences processed by the broker during parallel merge.
      • reportParallelMergeInputRows

        QueryMetrics<QueryType> reportParallelMergeInputRows​(long numRows)
        Reports total number of input rows processed by the broker during parallel merge.
      • reportParallelMergeOutputRows

        QueryMetrics<QueryType> reportParallelMergeOutputRows​(long numRows)
        Reports broker total number of output rows after merging and combining input sequences (should be less than or equal to the value supplied to reportParallelMergeInputRows(long).
      • reportParallelMergeTaskCount

        QueryMetrics<QueryType> reportParallelMergeTaskCount​(long numTasks)
        Reports broker total number of fork join pool tasks required to complete query
      • reportParallelMergeTotalCpuTime

        QueryMetrics<QueryType> reportParallelMergeTotalCpuTime​(long timeNs)
        Reports broker total CPU time in nanoseconds where fork join merge combine tasks were doing work
      • reportParallelMergeTotalTime

        QueryMetrics<QueryType> reportParallelMergeTotalTime​(long timeNs)
        Reports broker total "wall" time in nanoseconds from parallel merge start sequence creation to total consumption.
      • reportParallelMergeFastestPartitionTime

        QueryMetrics<QueryType> reportParallelMergeFastestPartitionTime​(long timeNs)
        Reports broker "wall" time in nanoseconds for the fastest parallel merge sequence partition to be 'initialized', where 'initialized' is time to the first result batch is populated from data servers and merging can begin. Similar to query 'time to first byte' metrics, except is a composite of the whole group of data servers which are present in the merge partition, which all must supply an initial result batch before merging can actually begin.
      • reportParallelMergeSlowestPartitionTime

        QueryMetrics<QueryType> reportParallelMergeSlowestPartitionTime​(long timeNs)
        Reports broker "wall" time in nanoseconds for the slowest parallel merge sequence partition to be 'initialized', where 'initialized' is time to the first result batch is populated from data servers and merging can begin. Similar to query 'time to first byte' metrics, except is a composite of the whole group of data servers which are present in the merge partition, which all must supply an initial result batch before merging can actually begin.
      • emit

        void emit​(ServiceEmitter emitter)
        Emits all metrics, registered since the last emit() call on this QueryMetrics object.