Class ApproximateQuantiles


  • public class ApproximateQuantiles
    extends java.lang.Object
    PTransforms for getting an idea of a PCollection's data distribution using approximate N-tiles (e.g. quartiles, percentiles, etc.), either globally or per-key.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  ApproximateQuantiles.ApproximateQuantilesCombineFn<T,​ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
      The ApproximateQuantilesCombineFn combiner gives an idea of the distribution of a collection of values using approximate N-tiles.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static <T extends java.lang.Comparable<T>>
      PTransform<PCollection<T>,​PCollection<java.util.List<T>>>
      globally​(int numQuantiles)
      Like globally(int, Comparator), but sorts using the elements' natural ordering.
      static <T,​ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
      PTransform<PCollection<T>,​PCollection<java.util.List<T>>>
      globally​(int numQuantiles, ComparatorT compareFn)
      Returns a PTransform that takes a PCollection<T> and returns a PCollection<List<T>> whose single value is a List of the approximate N-tiles of the elements of the input PCollection.
      static <K,​V extends java.lang.Comparable<V>>
      PTransform<PCollection<KV<K,​V>>,​PCollection<KV<K,​java.util.List<V>>>>
      perKey​(int numQuantiles)
      Like perKey(int, Comparator), but sorts values using their natural ordering.
      static <K,​V,​ComparatorT extends java.util.Comparator<V> & java.io.Serializable>
      PTransform<PCollection<KV<K,​V>>,​PCollection<KV<K,​java.util.List<V>>>>
      perKey​(int numQuantiles, ComparatorT compareFn)
      Returns a PTransform that takes a PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to a List of the approximate N-tiles of the values associated with that key in the input PCollection.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • globally

        public static <T,​ComparatorT extends java.util.Comparator<T> & java.io.Serializable> PTransform<PCollection<T>,​PCollection<java.util.List<T>>> globally​(int numQuantiles,
                                                                                                                                                                            ComparatorT compareFn)
        Returns a PTransform that takes a PCollection<T> and returns a PCollection<List<T>> whose single value is a List of the approximate N-tiles of the elements of the input PCollection. This gives an idea of the distribution of the input elements.

        The computed List is of size numQuantiles, and contains the input elements' minimum value, numQuantiles-2 intermediate values, and maximum value, in sorted order, using the given Comparator to order values. To compute traditional N-tiles, one should use ApproximateQuantiles.globally(N+1, compareFn).

        If there are fewer input elements than numQuantiles, then the result List will contain all the input elements, in sorted order.

        The argument Comparator must be Serializable.

        Example of use:

        
         PCollection<String> pc = ...;
         PCollection<List<String>> quantiles =
             pc.apply(ApproximateQuantiles.globally(11, stringCompareFn));
         
        Type Parameters:
        T - the type of the elements in the input PCollection
        Parameters:
        numQuantiles - the number of elements in the resulting quantile values List
        compareFn - the function to use to order the elements
      • globally

        public static <T extends java.lang.Comparable<T>> PTransform<PCollection<T>,​PCollection<java.util.List<T>>> globally​(int numQuantiles)
        Like globally(int, Comparator), but sorts using the elements' natural ordering.
        Type Parameters:
        T - the type of the elements in the input PCollection
        Parameters:
        numQuantiles - the number of elements in the resulting quantile values List
      • perKey

        public static <K,​V,​ComparatorT extends java.util.Comparator<V> & java.io.Serializable> PTransform<PCollection<KV<K,​V>>,​PCollection<KV<K,​java.util.List<V>>>> perKey​(int numQuantiles,
                                                                                                                                                                                                          ComparatorT compareFn)
        Returns a PTransform that takes a PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to a List of the approximate N-tiles of the values associated with that key in the input PCollection. This gives an idea of the distribution of the input values for each key.

        Each of the computed Lists is of size numQuantiles, and contains the input values' minimum value, numQuantiles-2 intermediate values, and maximum value, in sorted order, using the given Comparator to order values. To compute traditional N-tiles, one should use ApproximateQuantiles.perKey(compareFn, N+1).

        If a key has fewer than numQuantiles values associated with it, then that key's output List will contain all the key's input values, in sorted order.

        The argument Comparator must be Serializable.

        Example of use:

        
         PCollection<KV<Integer, String>> pc = ...;
         PCollection<KV<Integer, List<String>>> quantilesPerKey =
             pc.apply(ApproximateQuantiles.<Integer, String>perKey(stringCompareFn, 11));
         

        See Combine.PerKey for how this affects timestamps and windowing.

        Type Parameters:
        K - the type of the keys in the input and output PCollections
        V - the type of the values in the input PCollection
        Parameters:
        numQuantiles - the number of elements in the resulting quantile values List
        compareFn - the function to use to order the elements
      • perKey

        public static <K,​V extends java.lang.Comparable<V>> PTransform<PCollection<KV<K,​V>>,​PCollection<KV<K,​java.util.List<V>>>> perKey​(int numQuantiles)
        Like perKey(int, Comparator), but sorts values using their natural ordering.
        Type Parameters:
        K - the type of the keys in the input and output PCollections
        V - the type of the values in the input PCollection
        Parameters:
        numQuantiles - the number of elements in the resulting quantile values List