DataStream

Abstract Value Members

abstract def schema: StructType

Returns the Schema for this stream.
Returns the Schema for this stream. This call will not cause a full evaluation, but only the operations required to retrieve a schema will occur. For example, on a stream backed by a JDBC source, an empty resultset will be obtained in order to query the metadata for the database columns.

Concrete Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
def ++(other: DataStream): DataStream

Joins two streams together, such that the elements of the given frame are appended to the end of this streams.
Joins two streams together, such that the elements of the given frame are appended to the end of this streams. This operation is the same as a concat operation. This results in having numPartitions(a) + numPartitions(b)
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def addField(name: String, defaultValue: String): DataStream

Returns a new DataStream with the new field of type String added at the end.
Returns a new DataStream with the new field of type String added at the end. The value of this field for each Row is specified by the default value.
def addField(field: Field, defaultValue: Any): DataStream

Returns a new DataStream with the given field added at the end.
Returns a new DataStream with the given field added at the end. The value of this field for each Row is specified by the default value. The value must be compatible with the field definition. Eg, an error will occur if the field has type Int and the default value was 1.3
def addFieldIfNotExists(field: Field, defaultValue: Any): DataStream
def addFieldIfNotExists(name: String, defaultValue: Any): DataStream
def aggregated(): GroupedDataStream
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def collect: Vector[Row]

Action which results in all the rows being returned in memory as a Vector.
def count: Long
def drop(k: Int): DataStream

Returns a new DataStream where k number of rows has been dropped.
Returns a new DataStream where k number of rows has been dropped. This operation requires a reshuffle.
def dropNullRows(): DataStream
def dropWhile(fieldName: String, pred: (Any) ⇒ Boolean): DataStream
def dropWhile(p: (Row) ⇒ Boolean): DataStream
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
implicit val executor: ExecutionContextExecutor
def exists(p: (Row) ⇒ Boolean): Boolean
def explode(fn: (Row) ⇒ Seq[Row]): DataStream
def filter(fieldName: String, p: (Any) ⇒ Boolean): DataStream

Filters where the given field name matches the given predicate.
def filter(p: (Row) ⇒ Boolean): DataStream

For each row in the stream, filter drops any rows which do not match the predicate.
def filterNot(p: (Row) ⇒ Boolean): DataStream
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def find(p: (Row) ⇒ Boolean): Option[Row]
def fold[A](initial: A)(fn: (A, Row) ⇒ A): A
def forall(p: (Row) ⇒ Boolean): Boolean
def foreach[U](fn: (Row) ⇒ U): DataStream

Execute a side effecting function for every row in the stream, returning the same row.
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def groupBy(fn: (Row) ⇒ Any): GroupedDataStream
def groupBy(fields: Iterable[String]): GroupedDataStream
def groupBy(first: String, rest: String*): GroupedDataStream
def hashCode(): Int

Definition Classes
AnyRef → Any
def head: Row
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def iterator: Iterator[Row]

Action which returns a scala.collection.CloseIterator, which will result in the lazy evaluation of the stream, element by element.
def join(other: DataStream): DataStream

Combines two frames together such that the fields from this frame are joined with the fields of the given frame.
Combines two frames together such that the fields from this frame are joined with the fields of the given frame. Eg, if this frame has A,B and the given frame has C,D then the result will be A,B,C,D
Each stream has different partitions so we'll need to re-partition it to ensure we have an even distribution.
val logger: Logger

Attributes
protected
Definition Classes
Logging
def map(f: (Row) ⇒ Row): DataStream
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def projection(fields: Seq[String]): DataStream

Returns a new DataStream which contains the given list of fields from the existing stream.
def projection(first: String, rest: String*): DataStream
def projectionExpression(expr: String): DataStream
def removeField(fieldName: String, caseSensitive: Boolean = true): DataStream
def renameField(nameFrom: String, nameTo: String): DataStream
def replace(from: String, target: Any): DataStream

Foreach row, any values that match "from" will be replaced with "target".
Foreach row, any values that match "from" will be replaced with "target". This operation applies to all values for all rows.
def replace(fieldName: String, from: String, target: Any): DataStream

Replaces any values that match "form" with the value "target".
Replaces any values that match "form" with the value "target". This operation only applies to the field name specified.
def replace(fieldName: String, fn: (Any) ⇒ Any): DataStream

For each row, the value corresponding to the given fieldName is applied to the function.
For each row, the value corresponding to the given fieldName is applied to the function. The result of the function is the new value for that cell.
def replaceFieldType(from: DataType, toType: DataType): DataStream
def replaceNullValues(defaultValue: String): DataStream
def sample(k: Int): DataStream

Returns a new DataStream where only each "k" row is retained.
Returns a new DataStream where only each "k" row is retained. Ie, if sample is 2, then on average, every other row will be returned. If sample is 10 then only 10% of rows will be returned. When running concurrently, the rows that are sampled will vary depending on the ordering that the workers pull through the rows. Each partition uses its own couter.
def size: Long
def stripCharsFromFieldNames(chars: Seq[Char]): DataStream

Returns a new DataStream with the same data as this stream, but where the field names have been sanitized by removing any occurances of the given characters.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def take(k: Int): DataStream
def takeWhile(pred: (Row) ⇒ Boolean): DataStream
def takeWhile(fieldName: String, pred: (Any) ⇒ Boolean): DataStream
def to(sink: Sink, listener: Listener): Long
def to(sink: Sink): Long
def toSet: Set[Row]
def toString(): String

Definition Classes
AnyRef → Any
def toVector: Vector[Row]

Action which results in all the rows being returned in memory as a Vector.
Action which results in all the rows being returned in memory as a Vector. Alias for 'collect()'
def union(other: DataStream): DataStream
def updateField(name: String, field: Field): DataStream
def updateFieldType(fieldName: String, datatype: DataType): DataStream

Returns the same data but with an updated schema.
Returns the same data but with an updated schema. The field that matches the given name will have its datatype set to the given datatype.
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
def withLowerCaseSchema(): DataStream

Related Docs: object DataStream | package datastream

trait DataStream extends Logging

Abstract Value Members

abstract def schema: StructType

Concrete Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

def ++(other: DataStream): DataStream

final def ==(arg0: Any): Boolean

def addField(name: String, defaultValue: String): DataStream

def addField(field: Field, defaultValue: Any): DataStream

def addFieldIfNotExists(field: Field, defaultValue: Any): DataStream

def addFieldIfNotExists(name: String, defaultValue: Any): DataStream

def aggregated(): GroupedDataStream

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def collect: Vector[Row]

def count: Long

def drop(k: Int): DataStream

def dropNullRows(): DataStream

def dropWhile(fieldName: String, pred: (Any) ⇒ Boolean): DataStream

def dropWhile(p: (Row) ⇒ Boolean): DataStream

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

implicit val executor: ExecutionContextExecutor

def exists(p: (Row) ⇒ Boolean): Boolean

def explode(fn: (Row) ⇒ Seq[Row]): DataStream

def filter(fieldName: String, p: (Any) ⇒ Boolean): DataStream

def filter(p: (Row) ⇒ Boolean): DataStream

def filterNot(p: (Row) ⇒ Boolean): DataStream

def finalize(): Unit

def find(p: (Row) ⇒ Boolean): Option[Row]

def fold[A](initial: A)(fn: (A, Row) ⇒ A): A

def forall(p: (Row) ⇒ Boolean): Boolean

def foreach[U](fn: (Row) ⇒ U): DataStream

final def getClass(): Class[_]

def groupBy(fn: (Row) ⇒ Any): GroupedDataStream

def groupBy(fields: Iterable[String]): GroupedDataStream

def groupBy(first: String, rest: String*): GroupedDataStream

def hashCode(): Int

def head: Row

final def isInstanceOf[T0]: Boolean

def iterator: Iterator[Row]

def join(other: DataStream): DataStream

val logger: Logger

def map(f: (Row) ⇒ Row): DataStream

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def projection(fields: Seq[String]): DataStream

def projection(first: String, rest: String*): DataStream

def projectionExpression(expr: String): DataStream

def removeField(fieldName: String, caseSensitive: Boolean = true): DataStream

def renameField(nameFrom: String, nameTo: String): DataStream

def replace(from: String, target: Any): DataStream

def replace(fieldName: String, from: String, target: Any): DataStream

def replace(fieldName: String, fn: (Any) ⇒ Any): DataStream

def replaceFieldType(from: DataType, toType: DataType): DataStream

def replaceNullValues(defaultValue: String): DataStream

def sample(k: Int): DataStream

def size: Long

def stripCharsFromFieldNames(chars: Seq[Char]): DataStream

final def synchronized[T0](arg0: ⇒ T0): T0

def take(k: Int): DataStream

def takeWhile(pred: (Row) ⇒ Boolean): DataStream

def takeWhile(fieldName: String, pred: (Any) ⇒ Boolean): DataStream

def to(sink: Sink, listener: Listener): Long

def to(sink: Sink): Long

def toSet: Set[Row]

def toString(): String

def toVector: Vector[Row]

def union(other: DataStream): DataStream

def updateField(name: String, field: Field): DataStream

def updateFieldType(fieldName: String, datatype: DataType): DataStream

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

def withLowerCaseSchema(): DataStream

Inherited from Logging