Maps each row into object of a different type using provided function taking column value(s) as argument(s).
Maps each row into object of a different type using provided function taking column value(s) as argument(s). Can be used to convert each row to a tuple or a case class object:
sc.cassandraTable("ks", "table").select("column1").as((s: String) => s) // yields CassandraRDD[String] sc.cassandraTable("ks", "table").select("column1", "column2").as((_: String, _: Long)) // yields CassandraRDD[(String, Long)] case class MyRow(key: String, value: Long) sc.cassandraTable("ks", "table").select("column1", "column2").as(MyRow) // yields CassandraRDD[MyRow]
Adds a CQL ORDER BY
clause to the query.
Adds a CQL ORDER BY
clause to the query.
It can be applied only in case there are clustering columns and primary key predicate is
pushed down in where
.
It is useful when the default direction of ordering rows within a single Cassandra partition
needs to be changed.
Adds the limit clause to CQL select statement.
Adds the limit clause to CQL select statement. The limit will be applied for each created Spark partition. In other words, unless the data are fetched from a single Cassandra partition the number of results is unpredictable.
The main purpose of passing limit clause is to fetch top n rows from a single Cassandra partition when the table is designed so that it uses clustering keys and a partition key predicate is passed to the where clause.
Saves the data from RDD
to a new table with definition taken from the
ColumnMapper
for this class.
Saves the data from RDD
to a new table with definition taken from the
ColumnMapper
for this class.
keyspace where to create a new table
name of the table to create; the table must not exist
Selects the columns to save data to. Uses only the unique column names, and you must select at least all primary key columns. All other fields are discarded. Non-selected property/column names are left unchanged. This parameter does not affect table creation.
additional configuration object allowing to set consistency level, batch size, etc.
optional, implicit connector to Cassandra
factory for obtaining the row writer to be used to extract column values
from items of the RDD
a column mapper determining the definition of the table
Saves the data from RDD
to a new table defined by the given TableDef
.
Saves the data from RDD
to a new table defined by the given TableDef
.
First it creates a new table with all columns from the TableDef
and then it saves RDD content in the same way as saveToCassandra.
The table must not exist prior to this call.
table definition used to create a new table
Selects the columns to save data to. Uses only the unique column names, and you must select at least all primary key columns. All other fields are discarded. Non-selected property/column names are left unchanged. This parameter does not affect table creation.
additional configuration object allowing to set consistency level, batch size, etc.
optional, implicit connector to Cassandra
factory for obtaining the row writer to be used to extract column values
from items of the RDD
Saves the data from RDD
to a Cassandra table.
Saves the data from RDD
to a Cassandra table. Uses the specified column names.
the name of the Keyspace to use
the name of the Table to use
additional configuration object allowing to set consistency level, batch size, etc.
Narrows down the selected set of columns.
Narrows down the selected set of columns.
Use this for better performance, when you don't need all the columns in the result RDD.
When called multiple times, it selects the subset of the already selected columns, so
after a column was removed by the previous select
call, it is not possible to
add it back.
The selected columns are NamedColumnRef instances. This type allows to specify columns for
straightforward retrieval and to read TTL or write time of regular columns as well. Implicit
conversions included in com.datastax.spark.connector package make it possible to provide
just column names (which is also backward compatible) and optional add .ttl
or .writeTime
suffix in order to create an appropriate NamedColumnRef instance.
Returns the names of columns to be selected from the table.
Returns the names of columns to be selected from the table.
Applies a function to each item, and groups consecutive items having the same value together.
Applies a function to each item, and groups consecutive items having the same value together.
Contrary to groupBy
, items from the same group must be already next to each other in the
original collection. Works locally on each partition, so items from different
partitions will never be placed in the same group.
Groups items with the same key, assuming the items with the same key are next to each other in the collection.
Groups items with the same key, assuming the items with the same key are next to each other
in the collection. It does not perform shuffle, therefore it is much faster than using
much more universal Spark RDD groupByKey
. For this method to be useful with Cassandra tables,
the key must represent a prefix of the primary key, containing at least the partition key of the
Cassandra table.
Produces the empty CassandraRDD which has the same signature and properties, but it does not perform any validation and it does not even try to return any rows.
Produces the empty CassandraRDD which has the same signature and properties, but it does not perform any validation and it does not even try to return any rows.
Adds a CQL WHERE
predicate(s) to the query.
Adds a CQL WHERE
predicate(s) to the query.
Useful for leveraging secondary indexes in Cassandra.
Implicitly adds an ALLOW FILTERING
clause to the WHERE clause, however beware that some predicates
might be rejected by Cassandra, particularly in cases when they filter on an unindexed, non-clustering column.
Returns a copy of this Cassandra RDD with specified connector
Returns a copy of this Cassandra RDD with specified connector
Allows to set custom read configuration, e.
Allows to set custom read configuration, e.g. consistency level or fetch size.
(cassandraStreamingRDD: StringAdd).self
(cassandraStreamingRDD: StringFormat).self
(cassandraStreamingRDD: RDDFunctions[R]).sparkContext
(Since version 1.0.0) use mapPartitionsWithIndex and filter
(Since version 1.0.0) use mapPartitionsWithIndex and flatMap
(Since version 1.0.0) use mapPartitionsWithIndex and foreach
(Since version 1.2.0) use TaskContext.get
(Since version 0.7.0) use mapPartitionsWithIndex
(Since version 1.0.0) use mapPartitionsWithIndex
(Since version 1.0.0) use collect
(cassandraStreamingRDD: ArrowAssoc[CassandraStreamingRDD[R]]).x
(Since version 2.10.0) Use leftOfArrow
instead
(cassandraStreamingRDD: Ensuring[CassandraStreamingRDD[R]]).x
(Since version 2.10.0) Use resultOfEnsuring
instead
RDD representing a Cassandra table for Spark Streaming.
com.datastax.spark.connector.rdd.CassandraRDD