datastream

Type Members

trait Aggregation extends AnyRef
trait Cancellable extends AnyRef
trait DataStream extends Logging

A DataStream is kind of like a table of data.
A DataStream is kind of like a table of data. It has fields (like columns) and rows of data. Each row has an entry for each field (this may be null depending on the field definition).
It is a lazily evaluated data structure. Each operation on a stream will create a new derived stream, but those operations will only occur when a final action is performed.
You can create a DataStream from an IO source, such as a Parquet file or a Hive table, or you may create a fully evaluated one from an in memory structure. In the case of the former, the data will only be loaded on demand as an action is performed.
A DataStream is split into one or more flows. Each flow can operate independantly of the others. For example, if you filter a flow, each flow will be filtered seperately, which allows it to be parallelized. If you write out a flow, each partition can be written out to individual files, again allowing parallelization.
class DataStreamSource extends DataStream with Using with Logging
abstract class DefaultAggregation extends Aggregation
class DelegateSubscriber[T] extends Subscriber[T]
class ExistsSubscriber extends Subscriber[Seq[Row]] with Logging
class FindSubscriber extends Subscriber[Seq[Row]] with Logging
trait GroupedDataStream extends AnyRef
case class IteratorAction(ds: DataStream) extends Product with Serializable
case class SinkAction(ds: DataStream, sink: Sink, parallelism: Int) extends Logging with Product with Serializable
trait Subscriber[T] extends AnyRef

package datastream

Type Members

trait Aggregation extends AnyRef

trait Cancellable extends AnyRef

trait DataStream extends Logging

class DataStreamSource extends DataStream with Using with Logging

abstract class DefaultAggregation extends Aggregation

class DelegateSubscriber[T] extends Subscriber[T]

class ExistsSubscriber extends Subscriber[Seq[Row]] with Logging

class FindSubscriber extends Subscriber[Seq[Row]] with Logging

trait GroupedDataStream extends AnyRef

case class IteratorAction(ds: DataStream) extends Product with Serializable

case class SinkAction(ds: DataStream, sink: Sink, parallelism: Int) extends Logging with Product with Serializable

trait Subscriber[T] extends AnyRef

Value Members

object Aggregation

object DataStream

object ExecutorInstances

object GroupedDataStream

Ungrouped