Take top K on each partition and return a new RDD.
Repartition an RDD using the given partitioner.
Repartition an RDD using the given partitioner. This is similar to Spark's partitionBy, except we use the Shark shuffle serializer.
Sort the RDD by key.
Sort the RDD by key. This is similar to Spark's sortByKey, except that we use the Shark shuffle serializer.
Return an RDD containing the top K (K smallest key) from the given RDD.
Returns a UnionRDD using both RDD arguments.
Returns a UnionRDD using both RDD arguments. Any UnionRDD argument is "flattened", in that its parent sequence of RDDs is directly passed to the UnionRDD returned.
A set of RDD-related functions that provide some handy features in addition to Spark's built-in abstractions.