Class Top


  • public class Top
    extends java.lang.Object
    PTransforms for finding the largest (or smallest) set of elements in a PCollection, or the largest (or smallest) set of values associated with each key in a PCollection of KVs.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  Top.Largest<T extends java.lang.Comparable<? super T>>
      Deprecated.
      use Top.Natural instead
      static class  Top.Natural<T extends java.lang.Comparable<? super T>>
      A Serializable Comparator that that uses the compared elements' natural ordering.
      static class  Top.Reversed<T extends java.lang.Comparable<? super T>>
      Serializable Comparator that that uses the reverse of the compared elements' natural ordering.
      static class  Top.Smallest<T extends java.lang.Comparable<? super T>>
      Deprecated.
      use Top.Reversed instead
      static class  Top.TopCombineFn<T,​ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
      CombineFn for Top transforms that combines a bunch of Ts into a single count-long List<T>, using compareFn to choose the largest Ts.
    • Method Summary

      All Methods Static Methods Concrete Methods 
      Modifier and Type Method Description
      static <T extends java.lang.Comparable<T>>
      Combine.Globally<T,​java.util.List<T>>
      largest​(int count)
      Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the largest count elements of the input PCollection<T>, in decreasing order, sorted according to their natural order.
      static Top.TopCombineFn<java.lang.Double,​Top.Natural<java.lang.Double>> largestDoublesFn​(int count)
      Returns a Top.TopCombineFn that aggregates the largest count double values.
      static <T extends java.lang.Comparable<T>>
      Top.TopCombineFn<T,​Top.Natural<T>>
      largestFn​(int count)
      Returns a Top.TopCombineFn that aggregates the largest count values.
      static Top.TopCombineFn<java.lang.Integer,​Top.Natural<java.lang.Integer>> largestIntsFn​(int count)
      Returns a Top.TopCombineFn that aggregates the largest count int values.
      static Top.TopCombineFn<java.lang.Long,​Top.Natural<java.lang.Long>> largestLongsFn​(int count)
      Returns a Top.TopCombineFn that aggregates the largest count long values.
      static <K,​V extends java.lang.Comparable<V>>
      Combine.PerKey<K,​V,​java.util.List<V>>
      largestPerKey​(int count)
      Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the largest count values associated with that key in the input PCollection<KV<K, V>>, in decreasing order, sorted according to their natural order.
      static <T,​ComparatorT extends java.util.Comparator<T> & java.io.Serializable>
      Combine.Globally<T,​java.util.List<T>>
      of​(int count, ComparatorT compareFn)
      Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the largest count elements of the input PCollection<T>, in decreasing order, sorted using the given Comparator<T>.
      static <K,​V,​ComparatorT extends java.util.Comparator<V> & java.io.Serializable>
      PTransform<PCollection<KV<K,​V>>,​PCollection<KV<K,​java.util.List<V>>>>
      perKey​(int count, ComparatorT compareFn)
      Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the largest count values associated with that key in the input PCollection<KV<K, V>>, in decreasing order, sorted using the given Comparator<V>.
      static <T extends java.lang.Comparable<T>>
      Combine.Globally<T,​java.util.List<T>>
      smallest​(int count)
      Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the smallest count elements of the input PCollection<T>, in increasing order, sorted according to their natural order.
      static Top.TopCombineFn<java.lang.Double,​Top.Reversed<java.lang.Double>> smallestDoublesFn​(int count)
      Returns a Top.TopCombineFn that aggregates the smallest count double values.
      static <T extends java.lang.Comparable<T>>
      Top.TopCombineFn<T,​Top.Reversed<T>>
      smallestFn​(int count)
      Returns a Top.TopCombineFn that aggregates the smallest count values.
      static Top.TopCombineFn<java.lang.Integer,​Top.Reversed<java.lang.Integer>> smallestIntsFn​(int count)
      Returns a Top.TopCombineFn that aggregates the smallest count int values.
      static Top.TopCombineFn<java.lang.Long,​Top.Reversed<java.lang.Long>> smallestLongsFn​(int count)
      Returns a Top.TopCombineFn that aggregates the smallest count long values.
      static <K,​V extends java.lang.Comparable<V>>
      PTransform<PCollection<KV<K,​V>>,​PCollection<KV<K,​java.util.List<V>>>>
      smallestPerKey​(int count)
      Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the smallest count values associated with that key in the input PCollection<KV<K, V>>, in increasing order, sorted according to their natural order.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • of

        public static <T,​ComparatorT extends java.util.Comparator<T> & java.io.Serializable> Combine.Globally<T,​java.util.List<T>> of​(int count,
                                                                                                                                                  ComparatorT compareFn)
        Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the largest count elements of the input PCollection<T>, in decreasing order, sorted using the given Comparator<T>. The Comparator<T> must also be Serializable.

        If count > the number of elements in the input PCollection, then all the elements of the input PCollection will be in the resulting List, albeit in sorted order.

        All the elements of the result's List must fit into the memory of a single machine.

        Example of use:

        
         PCollection<Student> students = ...;
         PCollection<List<Student>> top10Students =
             students.apply(Top.of(10, new CompareStudentsByAvgGrade()));
         

        By default, the Coder of the output PCollection is a ListCoder of the Coder of the elements of the input PCollection.

        If the input PCollection is windowed into GlobalWindows, an empty List<T> in the GlobalWindow will be output if the input PCollection is empty. To use this with inputs with other windowing, either withoutDefaults or asSingletonView must be called.

        See also smallest(int) and largest(int), which sort Comparable elements using their natural ordering.

        See also perKey(int, ComparatorT), smallestPerKey(int), and largestPerKey(int), which take a PCollection of KVs and return the top values associated with each key.

      • smallest

        public static <T extends java.lang.Comparable<T>> Combine.Globally<T,​java.util.List<T>> smallest​(int count)
        Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the smallest count elements of the input PCollection<T>, in increasing order, sorted according to their natural order.

        If count > the number of elements in the input PCollection, then all the elements of the input PCollection will be in the resulting PCollection's List, albeit in sorted order.

        All the elements of the result List must fit into the memory of a single machine.

        Example of use:

        
         PCollection<Integer> values = ...;
         PCollection<List<Integer>> smallest10Values = values.apply(Top.smallest(10));
         

        By default, the Coder of the output PCollection is a ListCoder of the Coder of the elements of the input PCollection.

        If the input PCollection is windowed into GlobalWindows, an empty List<T> in the GlobalWindow will be output if the input PCollection is empty. To use this with inputs with other windowing, either withoutDefaults or asSingletonView must be called.

        See also largest(int).

        See also of(int, ComparatorT), which sorts using a user-specified Comparator function.

        See also perKey(int, ComparatorT), smallestPerKey(int), and largestPerKey(int), which take a PCollection of KVs and return the top values associated with each key.

      • largest

        public static <T extends java.lang.Comparable<T>> Combine.Globally<T,​java.util.List<T>> largest​(int count)
        Returns a PTransform that takes an input PCollection<T> and returns a PCollection<List<T>> with a single element containing the largest count elements of the input PCollection<T>, in decreasing order, sorted according to their natural order.

        If count > the number of elements in the input PCollection, then all the elements of the input PCollection will be in the resulting PCollection's List, albeit in sorted order.

        All the elements of the result's List must fit into the memory of a single machine.

        Example of use:

        
         PCollection<Integer> values = ...;
         PCollection<List<Integer>> largest10Values = values.apply(Top.largest(10));
         

        By default, the Coder of the output PCollection is a ListCoder of the Coder of the elements of the input PCollection.

        If the input PCollection is windowed into GlobalWindows, an empty List<T> in the GlobalWindow will be output if the input PCollection is empty. To use this with inputs with other windowing, either withoutDefaults or asSingletonView must be called.

        See also smallest(int).

        See also of(int, ComparatorT), which sorts using a user-specified Comparator function.

        See also perKey(int, ComparatorT), smallestPerKey(int), and largestPerKey(int), which take a PCollection of KVs and return the top values associated with each key.

      • perKey

        public static <K,​V,​ComparatorT extends java.util.Comparator<V> & java.io.Serializable> PTransform<PCollection<KV<K,​V>>,​PCollection<KV<K,​java.util.List<V>>>> perKey​(int count,
                                                                                                                                                                                                          ComparatorT compareFn)
        Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the largest count values associated with that key in the input PCollection<KV<K, V>>, in decreasing order, sorted using the given Comparator<V>. The Comparator<V> must also be Serializable.

        If there are fewer than count values associated with a particular key, then all those values will be in the result mapping for that key, albeit in sorted order.

        All the values associated with a single key must fit into the memory of a single machine, but there can be many more KVs in the resulting PCollection than can fit into the memory of a single machine.

        Example of use:

        
         PCollection<KV<School, Student>> studentsBySchool = ...;
         PCollection<KV<School, List<Student>>> top10StudentsBySchool =
             studentsBySchool.apply(
                 Top.perKey(10, new CompareStudentsByAvgGrade()));
         

        By default, the Coder of the keys of the output PCollection is the same as that of the keys of the input PCollection, and the Coder of the values of the output PCollection is a ListCoder of the Coder of the values of the input PCollection.

        See also smallestPerKey(int) and largestPerKey(int), which sort Comparable<V> values using their natural ordering.

        See also of(int, ComparatorT), smallest(int), and largest(int), which take a PCollection and return the top elements.

      • smallestPerKey

        public static <K,​V extends java.lang.Comparable<V>> PTransform<PCollection<KV<K,​V>>,​PCollection<KV<K,​java.util.List<V>>>> smallestPerKey​(int count)
        Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the smallest count values associated with that key in the input PCollection<KV<K, V>>, in increasing order, sorted according to their natural order.

        If there are fewer than count values associated with a particular key, then all those values will be in the result mapping for that key, albeit in sorted order.

        All the values associated with a single key must fit into the memory of a single machine, but there can be many more KVs in the resulting PCollection than can fit into the memory of a single machine.

        Example of use:

        
         PCollection<KV<String, Integer>> keyedValues = ...;
         PCollection<KV<String, List<Integer>>> smallest10ValuesPerKey =
             keyedValues.apply(Top.smallestPerKey(10));
         

        By default, the Coder of the keys of the output PCollection is the same as that of the keys of the input PCollection, and the Coder of the values of the output PCollection is a ListCoder of the Coder of the values of the input PCollection.

        See also largestPerKey(int).

        See also perKey(int, ComparatorT), which sorts values using a user-specified Comparator function.

        See also of(int, ComparatorT), smallest(int), and largest(int), which take a PCollection and return the top elements.

      • largestPerKey

        public static <K,​V extends java.lang.Comparable<V>> Combine.PerKey<K,​V,​java.util.List<V>> largestPerKey​(int count)
        Returns a PTransform that takes an input PCollection<KV<K, V>> and returns a PCollection<KV<K, List<V>>> that contains an output element mapping each distinct key in the input PCollection to the largest count values associated with that key in the input PCollection<KV<K, V>>, in decreasing order, sorted according to their natural order.

        If there are fewer than count values associated with a particular key, then all those values will be in the result mapping for that key, albeit in sorted order.

        All the values associated with a single key must fit into the memory of a single machine, but there can be many more KVs in the resulting PCollection than can fit into the memory of a single machine.

        Example of use:

        
         PCollection<KV<String, Integer>> keyedValues = ...;
         PCollection<KV<String, List<Integer>>> largest10ValuesPerKey =
             keyedValues.apply(Top.largestPerKey(10));
         

        By default, the Coder of the keys of the output PCollection is the same as that of the keys of the input PCollection, and the Coder of the values of the output PCollection is a ListCoder of the Coder of the values of the input PCollection.

        See also smallestPerKey(int).

        See also perKey(int, ComparatorT), which sorts values using a user-specified Comparator function.

        See also of(int, ComparatorT), smallest(int), and largest(int), which take a PCollection and return the top elements.