Class Summary

  • All Implemented Interfaces:
    Collector.Describable

    public class Summary
    extends SimpleCollector<Summary.Child>
    implements Collector.Describable
    Summary metrics and Histogram metrics can both be used to monitor distributions like latencies or request sizes.

    An overview of when to use Summaries and when to use Histograms can be found on https://prometheus.io/docs/practices/histograms.

    The following example shows how to measure latencies and request sizes:

     class YourClass {
    
       private static final Summary requestLatency = Summary.build()
           .name("requests_latency_seconds")
           .help("request latency in seconds")
           .register();
    
       private static final Summary receivedBytes = Summary.build()
           .name("requests_size_bytes")
           .help("request size in bytes")
           .register();
    
       public void processRequest(Request req) {
         Summary.Timer requestTimer = requestLatency.startTimer();
         try {
           // Your code here.
         } finally {
           requestTimer.observeDuration();
           receivedBytes.observe(req.size());
         }
       }
     }
     
    The Summary class provides different utility methods for observing values, like observe(double), startTimer() and Summary.Timer.observeDuration(), time(Callable), etc.

    By default, Summary metrics provide the count and the sum. For example, if you measure latencies of a REST service, the count will tell you how often the REST service was called, and the sum will tell you the total aggregated response time. You can calculate the average response time using a Prometheus query dividing sum / count.

    In addition to count and sum, you can configure a Summary to provide quantiles:

     Summary requestLatency = Summary.build()
         .name("requests_latency_seconds")
         .help("Request latency in seconds.")
         .quantile(0.5, 0.01)    // 0.5 quantile (median) with 0.01 allowed error
         .quantile(0.95, 0.005)  // 0.95 quantile with 0.005 allowed error
         // ...
         .register();
     
    As an example, a 0.95 quantile of 120ms tells you that 95% of the calls were faster than 120ms, and 5% of the calls were slower than 120ms.

    Tracking exact quantiles require a large amount of memory, because all observations need to be stored in a sorted list. Therefore, we allow an error to significantly reduce memory usage.

    In the example, the allowed error of 0.005 means that you will not get the exact 0.95 quantile, but anything between the 0.945 quantile and the 0.955 quantile.

    Experiments show that the Summary typically needs to keep less than 100 samples to provide that precision, even if you have hundreds of millions of observations.

    There are a few special cases:

    • You can set an allowed error of 0, but then the Summary will keep all observations in memory.
    • You can track the minimum value with .quantile(0.0, 0.0). This special case will not use additional memory even though the allowed error is 0.
    • You can track the maximum value with .quantile(1.0, 0.0). This special case will not use additional memory even though the allowed error is 0.
    Typically, you don't want to have a Summary representing the entire runtime of the application, but you want to look at a reasonable time interval. Summary metrics implement a configurable sliding time window:
     Summary requestLatency = Summary.build()
         .name("requests_latency_seconds")
         .help("Request latency in seconds.")
         .maxAgeSeconds(10 * 60)
         .ageBuckets(5)
         // ...
         .register();
     
    The default is a time window of 10 minutes and 5 age buckets, i.e. the time window is 10 minutes wide, and we slide it forward every 2 minutes.
    • Method Detail

      • build

        public static Summary.Builder build​(String name,
                                            String help)
        Return a Builder to allow configuration of a new Summary. Ensures required fields are provided.
        Parameters:
        name - The name of the metric
        help - The help string of the metric
      • build

        public static Summary.Builder build()
        Return a Builder to allow configuration of a new Summary.
      • time

        public double time​(Runnable timeable)
        Executes runnable code (e.g. a Java 8 Lambda) and observes a duration of how long it took to run.
        Parameters:
        timeable - Code that is being timed
        Returns:
        Measured duration in seconds for timeable to complete.
      • time

        public <E> E time​(Callable<E> timeable)
        Executes callable code (e.g. a Java 8 Lambda) and observes a duration of how long it took to run.
        Parameters:
        timeable - Code that is being timed
        Returns:
        Result returned by callable.
      • describe

        public List<Collector.MetricFamilySamplesdescribe()
        Description copied from interface: Collector.Describable
        Provide a list of metric families this Collector is expected to return. These should exclude the samples. This is used by the registry to detect collisions and duplicate registrations. Usually custom collectors do not have to implement Describable. If Describable is not implemented and the CollectorRegistry was created with auto describe enabled (which is the case for the default registry) then Collector.collect() will be called at registration time instead of describe. If this could cause problems, either implement a proper describe, or if that's not practical have describe return an empty list.
        Specified by:
        describe in interface Collector.Describable