HpccRDD (spark-hpcc 7.6.28-1 API)

java.lang.Object
- org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>
- - org.hpccsystems.spark.HpccRDD

All Implemented Interfaces:

Serializable, org.apache.spark.internal.Logging
```
public class HpccRDD
extends org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>
implements Serializable
```
The implementation of the RDD

See Also:

Serialized Form

Field Summary

Fields
Modifier and Type Field and Description

static int DEFAULT_CONNECTION_TIMEOUT
- Fields inherited from class org.apache.spark.rdd.RDD
  org$apache$spark$rdd$RDD$$evidence$1

Fields
Modifier and Type	Field and Description
`static int`	`DEFAULT_CONNECTION_TIMEOUT`

Constructor Summary

Constructors
Constructor and Description
`HpccRDD(org.apache.spark.SparkContext sc, org.hpccsystems.dfs.client.DataPartition[] dataParts, org.hpccsystems.commons.ecl.FieldDef originalRD)`
`HpccRDD(org.apache.spark.SparkContext sc, org.hpccsystems.dfs.client.DataPartition[] dataParts, org.hpccsystems.commons.ecl.FieldDef originalRD, org.hpccsystems.commons.ecl.FieldDef projectedRD)`
`HpccRDD(org.apache.spark.SparkContext sc, org.hpccsystems.dfs.client.DataPartition[] dataParts, org.hpccsystems.commons.ecl.FieldDef originalRD, org.hpccsystems.commons.ecl.FieldDef projectedRD, int connectTimeout, int limit)`

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row>`	`asJavaRDD()` Wrap this RDD as a JavaRDD so the Java API can be used.
`org.apache.spark.InterruptibleIterator<org.apache.spark.sql.Row>`	`compute(org.apache.spark.Partition p_arg, org.apache.spark.TaskContext ctx)`
`org.apache.spark.Partition[]`	`getPartitions()`
`scala.collection.Seq<String>`	`getPreferredLocations(org.apache.spark.Partition split)`
`org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint>`	`makeMLLibLabeledPoint(String labelName, String[] dimNames)` Transform to an RDD of labeled points for MLLib supervised learning.
`org.apache.spark.rdd.RDD<org.apache.spark.mllib.linalg.Vector>`	`makeMLLibVector(String[] dimNames)` Transform to mllib.linalg.Vectors for ML Lib machine learning.

Methods inherited from class org.apache.spark.rdd.RDD
$plus$plus, aggregate, barrier, cache, cartesian, checkpoint, checkpointData_$eq, checkpointData, clearDependencies, coalesce, coalesce$default$2, coalesce$default$3, coalesce$default$4, collect, collect, collectPartitions, computeOrReadCheckpoint, conf, context, count, countApprox, countApprox$default$2, countApproxDistinct, countApproxDistinct, countApproxDistinct$default$1, countByValue, countByValue$default$1, countByValueApprox, countByValueApprox$default$2, countByValueApprox$default$3, creationSite, dependencies, distinct, distinct, distinct$default$2, doCheckpoint, doubleRDDToDoubleRDDFunctions, elementClassTag, filter, first, firstParent, flatMap, fold, foreach, foreachPartition, getCheckpointFile, getCreationSite, getDependencies, getNarrowAncestors, getNumPartitions, getOrCompute, getOutputDeterministicLevel, getStorageLevel, glom, groupBy, groupBy, groupBy, groupBy$default$4, id, initializeLogIfNecessary, initializeLogIfNecessary, initializeLogIfNecessary$default$2, intersection, intersection, intersection, intersection$default$3, isBarrier_, isBarrier, isCheckpointed, isCheckpointedAndMaterialized, isEmpty, isLocallyCheckpointed, isReliablyCheckpointed, isTraceEnabled, iterator, keyBy, localCheckpoint, log, logDebug, logDebug, logError, logError, logInfo, logInfo, logName, logTrace, logTrace, logWarning, logWarning, map, mapPartitions, mapPartitions$default$2, mapPartitionsInternal, mapPartitionsInternal$default$2, mapPartitionsWithIndex, mapPartitionsWithIndex$default$2, mapPartitionsWithIndexInternal, mapPartitionsWithIndexInternal$default$2, mapPartitionsWithIndexInternal$default$3, markCheckpointed, max, min, name_$eq, name, numericRDDToDoubleRDDFunctions, org$apache$spark$internal$Logging$$log__$eq, org$apache$spark$internal$Logging$$log_, org$apache$spark$rdd$RDD$$checkpointAllMarkedAncestors, org$apache$spark$rdd$RDD$$debugString$1, org$apache$spark$rdd$RDD$$debugString$default$4$1, org$apache$spark$rdd$RDD$$dependencies__$eq, org$apache$spark$rdd$RDD$$dependencies_, org$apache$spark$rdd$RDD$$doCheckpointCalled_$eq, org$apache$spark$rdd$RDD$$doCheckpointCalled, org$apache$spark$rdd$RDD$$partitions__$eq, org$apache$spark$rdd$RDD$$partitions_, org$apache$spark$rdd$RDD$$sc, org$apache$spark$rdd$RDD$$visit$1, outputDeterministicLevel, parent, partitioner, partitions, persist, persist, pipe, pipe, pipe, pipe$default$2, pipe$default$3, pipe$default$4, pipe$default$5, pipe$default$6, pipe$default$7, preferredLocations, randomSampleWithRange, randomSplit, randomSplit$default$2, rddToAsyncRDDActions, rddToOrderedRDDFunctions, rddToPairRDDFunctions, rddToPairRDDFunctions$default$4, rddToSequenceFileRDDFunctions, reduce, repartition, repartition$default$2, retag, retag, sample, sample$default$3, saveAsObjectFile, saveAsTextFile, saveAsTextFile, scope, setName, sortBy, sortBy$default$2, sortBy$default$3, sparkContext, subtract, subtract, subtract, subtract$default$3, take, takeOrdered, takeSample, takeSample$default$3, toDebugString, toJavaRDD, toLocalIterator, top, toString, treeAggregate, treeAggregate$default$4, treeReduce, treeReduce$default$2, union, unpersist, unpersist$default$1, withScope, zip, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipPartitions, zipWithIndex, zipWithUniqueId

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Detail

DEFAULT_CONNECTION_TIMEOUT

public static int DEFAULT_CONNECTION_TIMEOUT

Constructor Detail

HpccRDD

public HpccRDD(org.apache.spark.SparkContext sc,
               org.hpccsystems.dfs.client.DataPartition[] dataParts,
               org.hpccsystems.commons.ecl.FieldDef originalRD)

Parameters:: sc -; dataParts -; originalRD -

HpccRDD

public HpccRDD(org.apache.spark.SparkContext sc,
               org.hpccsystems.dfs.client.DataPartition[] dataParts,
               org.hpccsystems.commons.ecl.FieldDef originalRD,
               org.hpccsystems.commons.ecl.FieldDef projectedRD)

Parameters:: sc -; dataParts -; originalRD -; projectedRD -

HpccRDD

public HpccRDD(org.apache.spark.SparkContext sc,
               org.hpccsystems.dfs.client.DataPartition[] dataParts,
               org.hpccsystems.commons.ecl.FieldDef originalRD,
               org.hpccsystems.commons.ecl.FieldDef projectedRD,
               int connectTimeout,
               int limit)

Parameters:: sc -; dataParts -; originalRD -; projectedRD -; limit -

Method Detail

asJavaRDD
```
public org.apache.spark.api.java.JavaRDD<org.apache.spark.sql.Row> asJavaRDD()
```
Wrap this RDD as a JavaRDD so the Java API can be used.

Returns:

a JavaRDD wrapper of the HpccRDD.

makeMLLibLabeledPoint

public org.apache.spark.rdd.RDD<org.apache.spark.mllib.regression.LabeledPoint> makeMLLibLabeledPoint(String labelName,
                                                                                                      String[] dimNames)
                                                                                               throws IllegalArgumentException

Transform to an RDD of labeled points for MLLib supervised learning.

Parameters:: labelName - the field name of the label datg; dimNames - the field names for the dimensions
Returns:
Throws:: IllegalArgumentException

makeMLLibVector

public org.apache.spark.rdd.RDD<org.apache.spark.mllib.linalg.Vector> makeMLLibVector(String[] dimNames)
                                                                               throws IllegalArgumentException

Transform to mllib.linalg.Vectors for ML Lib machine learning.

Parameters:: dimNames - the field names for the dimensions
Returns:
Throws:: IllegalArgumentException

compute

public org.apache.spark.InterruptibleIterator<org.apache.spark.sql.Row> compute(org.apache.spark.Partition p_arg,
                                                                                org.apache.spark.TaskContext ctx)

Specified by:: compute in class org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>

getPreferredLocations
```
public scala.collection.Seq<String> getPreferredLocations(org.apache.spark.Partition split)
```
Overrides:

getPreferredLocations in class org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>

getPartitions
```
public org.apache.spark.Partition[] getPartitions()
```
Specified by:

getPartitions in class org.apache.spark.rdd.RDD<org.apache.spark.sql.Row>

Class HpccRDD

Field Summary

Fields inherited from class org.apache.spark.rdd.RDD

Constructor Summary

Method Summary

Methods inherited from class org.apache.spark.rdd.RDD

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_CONNECTION_TIMEOUT

Constructor Detail

HpccRDD

HpccRDD

HpccRDD

Method Detail

asJavaRDD

makeMLLibLabeledPoint

makeMLLibVector

compute

getPreferredLocations

getPartitions