com.datastax.spark.connector.rdd

partitioner

package partitioner

Provides components for partitioning a Cassandra table into smaller parts of appropriate size. Each partition can be processed locally on at least one cluster node.

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. partitioner
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Type Members

  1. case class CassandraPartition(index: Int, endpoints: Iterable[InetAddress], tokenRanges: Iterable[CqlTokenRange], rowCount: Long) extends EndpointPartition with Product with Serializable

    Metadata describing Cassandra table partition processed by a single Spark task.

    Metadata describing Cassandra table partition processed by a single Spark task. Beware the term "partition" is overloaded. Here, in the context of Spark, it means an arbitrary collection of rows that can be processed locally on a single Cassandra cluster node. A CassandraPartition typically contains multiple CQL partitions, i.e. rows identified by different values of the CQL partitioning key.

    index

    identifier of the partition, used internally by Spark

    endpoints

    which nodes the data partition is located on

    tokenRanges

    token ranges determining the row set to be fetched

    rowCount

    estimated total row count in a partition

  2. class CassandraPartitionedRDD[T] extends RDD[T]

    RDD created by repartitionByCassandraReplica with preferred locations mapping to the CassandraReplicas each partition was created for.

  3. class CassandraRDDPartitioner[V, T <: Token[V]] extends AnyRef

    Creates CassandraPartitions for given Cassandra table

  4. case class CqlTokenRange(cql: String, values: Any*) extends Product with Serializable

    Stores a CQL WHERE predicate matching a range of tokens.

  5. trait EndpointPartition extends Partition

  6. class Murmur3PartitionerTokenRangeSplitter extends TokenRangeSplitter[Long, LongToken]

    Fast token range splitter assuming that data are spread out evenly in the whole range.

  7. class RandomPartitionerTokenRangeSplitter extends TokenRangeSplitter[BigInt, BigIntToken]

    Fast token range splitter assuming that data are spread out evenly in the whole range.

  8. case class ReplicaPartition(index: Int, endpoints: Set[InetAddress]) extends EndpointPartition with Product with Serializable

  9. class ReplicaPartitioner extends Partitioner

    The replica partitioner will work on an RDD which is keyed on sets of InetAddresses representing Cassandra Hosts .

    The replica partitioner will work on an RDD which is keyed on sets of InetAddresses representing Cassandra Hosts . It will group keys which share a common IP address into partitionsPerReplicaSet Partitions.

  10. class ServerSideTokenRangeSplitter[V, T <: Token[V]] extends TokenRangeSplitter[V, T] with Logging

    Delegates token range splitting to Cassandra server.

  11. class TokenRangeClusterer[V, T <: Token[V]] extends AnyRef

    Divides a set of token ranges into groups containing not more than maxRowCountPerGroup rows and not more than maxGroupSize token ranges.

    Divides a set of token ranges into groups containing not more than maxRowCountPerGroup rows and not more than maxGroupSize token ranges. Each group will form a single CassandraRDDPartition.

    The algorithm is as follows: 1. Sort token ranges by endpoints lexicographically. 2. Take the highest possible number of token ranges from the beginning of the list, such that their sum of rowCounts does not exceed maxRowCountPerGroup and they all contain at least one common endpoint. If it is not possible, take at least one item. Those token ranges will make a group. 3. Repeat the previous step until no more token ranges left.

  12. trait TokenRangeSplitter[V, T <: Token[V]] extends AnyRef

    Splits a token range into smaller sub-ranges, each with the desired approximate number of rows.

Value Members

  1. object CassandraRDDPartitioner

  2. package dht

Inherited from AnyRef

Inherited from Any

Ungrouped