rdd

Type Members

class CassandraJoinRDD[L, R] extends CassandraRDD[(L, R)] with CassandraTableRowReaderProvider[R]

An RDD that will do a selecting join between left RDD and the specified Cassandra Table This will perform individual selects to retrieve the rows from Cassandra and will take advantage of RDDs that have been partitioned with the com.datastax.spark.connector.rdd.partitioner.ReplicaPartitioner
An RDD that will do a selecting join between left RDD and the specified Cassandra Table This will perform individual selects to retrieve the rows from Cassandra and will take advantage of RDDs that have been partitioned with the com.datastax.spark.connector.rdd.partitioner.ReplicaPartitioner
L
item type on the left side of the join (any RDD)
R
item type on the right side of the join (fetched from Cassandra)
abstract class CassandraRDD[R] extends RDD[R]
trait CassandraTableRowReaderProvider[R] extends AnyRef

Used to get a RowReader of type [R] for transforming the rows of a particular Cassandra table into scala objects.
Used to get a RowReader of type [R] for transforming the rows of a particular Cassandra table into scala objects. Performs necessary checking of the schema and output class to make sure they are compatible.

See also
CassandraJoinRDD
CassandraTableScanRDD
class CassandraTableScanRDD[R] extends CassandraRDD[R] with CassandraTableRowReaderProvider[R]

RDD representing a Table Scan of A Cassandra table.
RDD representing a Table Scan of A Cassandra table.
This class is the main entry point for analyzing data in Cassandra database with Spark. Obtain objects of this class by calling com.datastax.spark.connector.SparkContextFunctions.cassandraTable.
Configuration properties should be passed in the SparkConf configuration of SparkContext. CassandraRDD needs to open connection to Cassandra, therefore it requires appropriate connection property values to be present in SparkConf. For the list of required and available properties, see CassandraConnector.
CassandraRDD divides the data set into smaller partitions, processed locally on every cluster node. A data partition consists of one or more contiguous token ranges. To reduce the number of roundtrips to Cassandra, every partition is fetched in batches.
The following properties control the number of partitions and the fetch size: - spark.cassandra.input.split.size: approx number of Cassandra partitions in a Spark partition, default 100000 - spark.cassandra.input.page.row.size: number of CQL rows fetched per roundtrip, default 1000
A CassandraRDD object gets serialized and sent to every Spark Executor, which then calls the compute method to fetch the data on every node. The getPreferredLocations method tells Spark the preferred nodes to fetch a partition from, so that the data for the partition are at the same node the task was sent to. If Cassandra nodes are collocated with Spark nodes, the queries are always sent to the Cassandra process running on the same node as the Spark Executor process, hence data are not transferred between nodes. If a Cassandra node fails or gets overloaded during read, the queries are retried to a different node.
By default, reads are performed at ConsistencyLevel.LOCAL_ONE in order to leverage data-locality and minimize network traffic. This read consistency level is controlled by the spark.cassandra.input.consistency.level property.
sealed trait ClusteringOrder extends Serializable
case class CqlWhereClause(predicates: Seq[String], values: Seq[Any]) extends Product with Serializable

Represents a logical conjunction of CQL predicates.
Represents a logical conjunction of CQL predicates. Each predicate can have placeholders denoted by '?' which get substituted by values from the values array. The number of placeholders must match the size of the values array.
class EmptyCassandraRDD[R] extends CassandraRDD[R]

Represents a CassandraRDD with no rows.
Represents a CassandraRDD with no rows. This RDD does not load any data from Cassandra and doesn't require for the table to exist.
case class ReadConf(splitSize: Int = ReadConf.DefaultSplitSize, fetchSize: Int = ReadConf.DefaultFetchSize, consistencyLevel: ConsistencyLevel = ReadConf.DefaultConsistencyLevel, taskMetricsEnabled: Boolean = ...) extends Product with Serializable

Read settings for RDD
Read settings for RDD
splitSize
number of Cassandra partitions to be read in a single Spark task
fetchSize
number of CQL rows to fetch in a single round-trip to Cassandra
consistencyLevel
consistency level for reads, default LOCAL_ONE; higher consistency level will disable data-locality
taskMetricsEnabled
whether or not enable task metrics updates (requires Spark 1.2+)
trait ValidRDDType[T] extends AnyRef

Annotations
@implicitNotFound( ... )

Value Members

object CassandraRDD extends Serializable
object CassandraTableScanRDD extends Serializable
object ClusteringOrder extends Serializable
object CqlWhereClause extends Serializable
object ReadConf extends Serializable
object ValidRDDType
package partitioner

Provides components for partitioning a Cassandra table into smaller parts of appropriate size.
Provides components for partitioning a Cassandra table into smaller parts of appropriate size. Each partition can be processed locally on at least one cluster node.
package reader

Provides components for reading data rows from Cassandra and converting them to objects of desired type.
Provides components for reading data rows from Cassandra and converting them to objects of desired type. Additionally provides a generic CassandraRow class which can represent any row.

package rdd

Type Members

class CassandraJoinRDD[L, R] extends CassandraRDD[(L, R)] with CassandraTableRowReaderProvider[R]

abstract class CassandraRDD[R] extends RDD[R]

trait CassandraTableRowReaderProvider[R] extends AnyRef

class CassandraTableScanRDD[R] extends CassandraRDD[R] with CassandraTableRowReaderProvider[R]

sealed trait ClusteringOrder extends Serializable

case class CqlWhereClause(predicates: Seq[String], values: Seq[Any]) extends Product with Serializable

class EmptyCassandraRDD[R] extends CassandraRDD[R]

case class ReadConf(splitSize: Int = ReadConf.DefaultSplitSize, fetchSize: Int = ReadConf.DefaultFetchSize, consistencyLevel: ConsistencyLevel = ReadConf.DefaultConsistencyLevel, taskMetricsEnabled: Boolean = ...) extends Product with Serializable

trait ValidRDDType[T] extends AnyRef

Value Members

object CassandraRDD extends Serializable

object CassandraTableScanRDD extends Serializable

object ClusteringOrder extends Serializable

object CqlWhereClause extends Serializable

object ReadConf extends Serializable

object ValidRDDType

package partitioner

package reader

Inherited from AnyRef

Inherited from Any

Ungrouped