Class HpccRDD

  • All Implemented Interfaces:
    Serializable, org.apache.spark.internal.Logging, scala.Serializable

    public class HpccRDD
    extends org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>
    implements Serializable
    The implementation of the RDD(GenericRowWithSchema) that allows reading a Dataset from HPCC Systems.
    See Also:
    Serialized Form
    • Field Summary

      Fields 
      Modifier and Type Field Description
      static int DEFAULT_CONNECTION_TIMEOUT  
      • Fields inherited from class org.apache.spark.rdd.RDD

        org$apache$spark$rdd$RDD$$evidence$1
    • Constructor Summary

      Constructors 
      Constructor Description
      HpccRDD​(org.apache.spark.SparkContext sc, org.hpccsystems.dfs.client.DataPartition[] dataParts, org.hpccsystems.commons.ecl.FieldDef originalRD)  
      HpccRDD​(org.apache.spark.SparkContext sc, org.hpccsystems.dfs.client.DataPartition[] dataParts, org.hpccsystems.commons.ecl.FieldDef originalRD, org.hpccsystems.commons.ecl.FieldDef projectedRD)  
      HpccRDD​(org.apache.spark.SparkContext sc, org.hpccsystems.dfs.client.DataPartition[] dataParts, org.hpccsystems.commons.ecl.FieldDef originalRD, org.hpccsystems.commons.ecl.FieldDef projectedRD, int connectTimeout, int limit)  
    • Method Summary

      All Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> asJavaRDD()
      Wrap this RDD as a JavaRDD so the Java API can be used.
      org.apache.spark.InterruptibleIterator<org.apache.spark.sql.Row> compute​(org.apache.spark.Partition p_arg, org.apache.spark.TaskContext ctx)  
      org.apache.spark.Partition[] getPartitions()  
      scala.collection.Seq<String> getPreferredLocations​(org.apache.spark.Partition split)  
      • Methods inherited from class org.apache.spark.rdd.RDD

        $plus$plus, aggregate, barrier, cache, cartesian, checkpoint, checkpointData, checkpointData_$eq, clearDependencies, coalesce, coalesce$default$2, coalesce$default$3, coalesce$default$4, collect, collect, collectPartitions, computeOrReadCheckpoint, conf, context, count, countApprox, countApprox$default$2, countApproxDistinct, countApproxDistinct, countApproxDistinct$default$1, countByValue, countByValue$default$1, countByValueApprox, countByValueApprox$default$2, countByValueApprox$default$3, creationSite, dependencies, distinct, distinct, distinct$default$2, doCheckpoint, doubleRDDToDoubleRDDFunctions, elementClassTag, filter, first, firstParent, flatMap, fold, foreach, foreachPartition, getCheckpointFile, getCreationSite, getDependencies, getNarrowAncestors, getNumPartitions, getOrCompute, getOutputDeterministicLevel, getStorageLevel, glom, groupBy, groupBy, groupBy, groupBy$default$4, id, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, intersection, intersection, intersection, intersection$default$3, isBarrier, isBarrier_, isCheckpointed, isCheckpointedAndMaterialized, isEmpty, isLocallyCheckpointed, isReliablyCheckpointed, isTraceEnabled, iterator, keyBy, localCheckpoint, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, map, mapPartitions, mapPartitions$default$2, mapPartitionsInternal, mapPartitionsInternal$default$2, mapPartitionsWithIndex, mapPartitionsWithIndex, mapPartitionsWithIndex$default$2, mapPartitionsWithIndexInternal, mapPartitionsWithIndexInternal$default$2, mapPartitionsWithIndexInternal$default$3, markCheckpointed, max, min, name, name_$eq, numericRDDToDoubleRDDFunctions, org$apache$spark$internal$Logging$$log_, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$rdd$RDD$$checkpointAllMarkedAncestors, org$apache$spark$rdd$RDD$$debugString$1, org$apache$spark$rdd$RDD$$debugString$default$4$1, org$apache$spark$rdd$RDD$$dependencies_, org$apache$spark$rdd$RDD$$dependencies__$eq, org$apache$spark$rdd$RDD$$doCheckpointCalled, org$apache$spark$rdd$RDD$$doCheckpointCalled_$eq, org$apache$spark$rdd$RDD$$partitions_, org$apache$spark$rdd$RDD$$partitions__$eq, org$apache$spark$rdd$RDD$$sc, org$apache$spark$rdd$RDD$$stateLock, org$apache$spark$rdd$RDD$$visit$1, outputDeterministicLevel, parent, partitioner, partitions, persist, persist, pipe, pipe, pipe, pipe$default$2, pipe$default$3, pipe$default$4, pipe$default$5, pipe$default$6, pipe$default$7, preferredLocations, randomSampleWithRange, randomSplit, randomSplit$default$2, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToPairRDDFunctions$default$4, rddToSequenceFileRDDFunctions, reduce, repartition, repartition$default$2, retag, retag, sample, sample$default$3, saveAsObjectFile, saveAsTextFile, saveAsTextFile, scope, setName, sortBy, sortBy$default$2, sortBy$default$3, sparkContext, subtract, subtract, subtract, subtract$default$3, take, takeOrdered, takeSample, takeSample$default$3, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeAggregate$default$4, treeReduce, treeReduce$default$2, union, unpersist, unpersist$default$1, withScope, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId
    • Field Detail

      • DEFAULT_CONNECTION_TIMEOUT

        public static int DEFAULT_CONNECTION_TIMEOUT
    • Constructor Detail

      • HpccRDD

        public HpccRDD​(org.apache.spark.SparkContext sc,
                       org.hpccsystems.dfs.client.DataPartition[] dataParts,
                       org.hpccsystems.commons.ecl.FieldDef originalRD)
        Parameters:
        sc - spark context
        dataParts - data parts
        originalRD - original record definition
      • HpccRDD

        public HpccRDD​(org.apache.spark.SparkContext sc,
                       org.hpccsystems.dfs.client.DataPartition[] dataParts,
                       org.hpccsystems.commons.ecl.FieldDef originalRD,
                       org.hpccsystems.commons.ecl.FieldDef projectedRD)
        Parameters:
        sc - spark context
        dataParts - data parts
        originalRD - original record definition
        projectedRD - projected record definition
      • HpccRDD

        public HpccRDD​(org.apache.spark.SparkContext sc,
                       org.hpccsystems.dfs.client.DataPartition[] dataParts,
                       org.hpccsystems.commons.ecl.FieldDef originalRD,
                       org.hpccsystems.commons.ecl.FieldDef projectedRD,
                       int connectTimeout,
                       int limit)
        Parameters:
        sc - spark context
        dataParts - data parts
        originalRD - original record definition
        projectedRD - projected record definition
        connectTimeout - connection timeout
        limit - file limit
    • Method Detail

      • asJavaRDD

        public org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> asJavaRDD()
        Wrap this RDD as a JavaRDD so the Java API can be used.
        Returns:
        a JavaRDD wrapper of the HpccRDD.
      • compute

        public org.apache.spark.InterruptibleIterator<org.apache.spark.sql.Row> compute​(org.apache.spark.Partition p_arg,
                                                                                        org.apache.spark.TaskContext ctx)
        Specified by:
        compute in class org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>
      • getPreferredLocations

        public scala.collection.Seq<String> getPreferredLocations​(org.apache.spark.Partition split)
        Overrides:
        getPreferredLocations in class org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>
      • getPartitions

        public org.apache.spark.Partition[] getPartitions()
        Specified by:
        getPartitions in class org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>