Class JavaRddOps

java.lang.Object
net.sansa_stack.spark.rdd.op.rdf.JavaRddOps

public class JavaRddOps extends Object
  • Constructor Summary

    Constructors
    Constructor
    Description
     
  • Method Summary

    Modifier and Type
    Method
    Description
    static <T, A, R> R
    aggregateUsingJavaCollector(org.apache.spark.api.java.JavaRDD<? extends T> rdd, Collector<? super T,A,R> collector)
    Aggregate a JavaRDD using a serializable Collector.
    static <K, V> org.apache.spark.api.java.JavaPairRDD<K,V>
    groupKeysAndReduceValues(org.apache.spark.api.java.JavaPairRDD<K,V> rdd, boolean distinct, boolean sortGraphsByIri, int numPartitions, org.apache.spark.api.java.function.Function2<V,V,V> reducer)
    Convenience helper to group values by keys, optionally sort them and reduce the values.
    static <K, V, O> org.apache.spark.api.java.JavaRDD<O>
    mapPartitions(org.apache.spark.api.java.JavaPairRDD<K,V> rdd, org.aksw.commons.util.stream.StreamFunction<scala.Tuple2<K,V>,O> fn)
     
    static <I, O> org.apache.spark.api.java.JavaRDD<O>
    mapPartitions(org.apache.spark.api.java.JavaRDD<I> rdd, org.aksw.commons.util.stream.StreamFunction<I,O> fn)
    Map operation based on a flowable transformer
    static <T> org.apache.spark.api.java.JavaRDD<T>
    unionIfNeeded(org.apache.spark.api.java.JavaSparkContext jsc, Collection<org.apache.spark.api.java.JavaRDD<T>> rdds)
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • JavaRddOps

      public JavaRddOps()
  • Method Details

    • unionIfNeeded

      public static <T> org.apache.spark.api.java.JavaRDD<T> unionIfNeeded(org.apache.spark.api.java.JavaSparkContext jsc, Collection<org.apache.spark.api.java.JavaRDD<T>> rdds)
    • groupKeysAndReduceValues

      public static <K, V> org.apache.spark.api.java.JavaPairRDD<K,V> groupKeysAndReduceValues(org.apache.spark.api.java.JavaPairRDD<K,V> rdd, boolean distinct, boolean sortGraphsByIri, int numPartitions, org.apache.spark.api.java.function.Function2<V,V,V> reducer)
      Convenience helper to group values by keys, optionally sort them and reduce the values.
      Returns:
      A new rdd with grouped and/or sorted keys and merged values according to specification
    • aggregateUsingJavaCollector

      public static <T, A, R> R aggregateUsingJavaCollector(org.apache.spark.api.java.JavaRDD<? extends T> rdd, Collector<? super T,A,R> collector)
      Aggregate a JavaRDD using a serializable Collector. Such collectors can be created e.g. using AggBuilder.
    • mapPartitions

      public static <I, O> org.apache.spark.api.java.JavaRDD<O> mapPartitions(org.apache.spark.api.java.JavaRDD<I> rdd, org.aksw.commons.util.stream.StreamFunction<I,O> fn)
      Map operation based on a flowable transformer
    • mapPartitions

      public static <K, V, O> org.apache.spark.api.java.JavaRDD<O> mapPartitions(org.apache.spark.api.java.JavaPairRDD<K,V> rdd, org.aksw.commons.util.stream.StreamFunction<scala.Tuple2<K,V>,O> fn)