Package net.sansa_stack.spark.rdd.op.rdf
Class JavaRddOps
java.lang.Object
net.sansa_stack.spark.rdd.op.rdf.JavaRddOps
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionstatic <T,
A, R> R aggregateUsingJavaCollector
(org.apache.spark.api.java.JavaRDD<? extends T> rdd, Collector<? super T, A, R> collector) Aggregate a JavaRDD using a serializable Collector.static <K,
V> org.apache.spark.api.java.JavaPairRDD<K, V> groupKeysAndReduceValues
(org.apache.spark.api.java.JavaPairRDD<K, V> rdd, boolean distinct, boolean sortGraphsByIri, int numPartitions, org.apache.spark.api.java.function.Function2<V, V, V> reducer) Convenience helper to group values by keys, optionally sort them and reduce the values.static <K,
V, O> org.apache.spark.api.java.JavaRDD<O> mapPartitions
(org.apache.spark.api.java.JavaPairRDD<K, V> rdd, org.aksw.commons.util.stream.StreamFunction<scala.Tuple2<K, V>, O> fn) static <I,
O> org.apache.spark.api.java.JavaRDD<O> mapPartitions
(org.apache.spark.api.java.JavaRDD<I> rdd, org.aksw.commons.util.stream.StreamFunction<I, O> fn) Map operation based on a flowable transformerstatic <T> org.apache.spark.api.java.JavaRDD<T>
unionIfNeeded
(org.apache.spark.api.java.JavaSparkContext jsc, Collection<org.apache.spark.api.java.JavaRDD<T>> rdds)
-
Constructor Details
-
JavaRddOps
public JavaRddOps()
-
-
Method Details
-
unionIfNeeded
public static <T> org.apache.spark.api.java.JavaRDD<T> unionIfNeeded(org.apache.spark.api.java.JavaSparkContext jsc, Collection<org.apache.spark.api.java.JavaRDD<T>> rdds) -
groupKeysAndReduceValues
public static <K,V> org.apache.spark.api.java.JavaPairRDD<K,V> groupKeysAndReduceValues(org.apache.spark.api.java.JavaPairRDD<K, V> rdd, boolean distinct, boolean sortGraphsByIri, int numPartitions, org.apache.spark.api.java.function.Function2<V, V, V> reducer) Convenience helper to group values by keys, optionally sort them and reduce the values.- Returns:
- A new rdd with grouped and/or sorted keys and merged values according to specification
-
aggregateUsingJavaCollector
public static <T,A, R aggregateUsingJavaCollectorR> (org.apache.spark.api.java.JavaRDD<? extends T> rdd, Collector<? super T, A, R> collector) Aggregate a JavaRDD using a serializable Collector. Such collectors can be created e.g. usingAggBuilder
. -
mapPartitions
public static <I,O> org.apache.spark.api.java.JavaRDD<O> mapPartitions(org.apache.spark.api.java.JavaRDD<I> rdd, org.aksw.commons.util.stream.StreamFunction<I, O> fn) Map operation based on a flowable transformer -
mapPartitions
public static <K,V, org.apache.spark.api.java.JavaRDD<O> mapPartitionsO> (org.apache.spark.api.java.JavaPairRDD<K, V> rdd, org.aksw.commons.util.stream.StreamFunction<scala.Tuple2<K, V>, O> fn)
-