Joins two streams together, such that the elements of the given datastream are appended to the end of this datastream.
Joins two streams together, such that the elements of the given datastream are appended to the end of this datastream.
Returns a new DataStream with the given field added at the end.
Returns a new DataStream with the given field added at the end. The value of this field for each Row is specified by the default value. The value must be compatible with the field definition. Eg, an error will occur if the field has type Int and the default value was 1.3
Returns a new DataStream with the new field of type String added at the end.
Returns a new DataStream with the new field of type String added at the end. The value of this field for each Row is specified by the default value.
Returns a new DataStream with a new field added at the end.
Returns a new DataStream with a new field added at the end. The value for the field is taken from the function which is invoked for each row.
Returns a new DataStream which is the result of joining every row in this datastream with every row in the given datastream.
Returns a new DataStream which is the result of joining every row in this datastream with every row in the given datastream.
The given datastream will be materialized before it is used.
For example, if this datastream has rows [a,b], [c,d] and [e,f] and the given datastream has [1,2] and [3,4] then the result will be [a,b,1,2], [a,b,3,4], [c,d,1,2], [c,d,3,4], [e,f,1,2] and [e,f,3,4].
Action which results in all the rows being returned in memory as a Vector.
Action which results in all the rows being returned in memory as a Vector.
Combines two datastreams together such that the fields from this datastream are joined with the fields of the given datastream.
Combines two datastreams together such that the fields from this datastream are joined with the fields of the given datastream. Eg, if this datastream has fields A,B and the given datastream has fields C,D then the result will have fields A,B,C,D
This operation requires an executor, as it must buffer rows to ensure an even distribution.
Filters where the given field name matches the given predicate.
Filters where the given field name matches the given predicate.
Execute a side effecting function for every row in the stream, returning the same row.
Execute a side effecting function for every row in the stream, returning the same row.
Joins the given datastream to this datastream on the given key column, where the values of the keys are equal as taken by the scala == operator.
Joins the given datastream to this datastream on the given key column, where the values of the keys are equal as taken by the scala == operator. Both datastreams must contain the key column.
The given datastream is fully inflated when this datastream needs to be materialized. For that reason, always use the smallest datastream as the parameter, and the larger datastream as the receiver.
Returns a new DataStream which contains the given list of fields from the existing stream.
Returns a new DataStream which contains the given list of fields from the existing stream.
Returns the same data but with an updated schema.
Returns the same data but with an updated schema. The field that matches the given name will have its datatype set to the given datatype.
Returns a new DataStream where only each "k" row is retained.
Returns a new DataStream where only each "k" row is retained. Ie, if sample is 2, then on average, every other row will be returned. If sample is 10 then only 10% of rows will be returned. When running concurrently, the rows that are sampled will vary depending on the ordering that the workers pull through the rows. Each partition uses its own couter.
Returns a new DataStream with the same data as this stream, but where the field names have been sanitized by removing any occurances of the given characters.
Returns a new DataStream with the same data as this stream, but where the field names have been sanitized by removing any occurances of the given characters.
Invoking this method returns two DataStreams.
Invoking this method returns two DataStreams. The first is the original datastream which will continue as is. The second is a DataStream which is fed by rows generated from the given function. The function is invoked for each row that passes through this stream.
Cancellation requests in the tee'd datastream do not propagate back to the original stream.
Action which results in all the rows being returned in memory as a Vector.
Action which results in all the rows being returned in memory as a Vector. Alias for 'collect()'
For each row, any values that match "from" will be replaced with "target".
For each row, any values that match "from" will be replaced with "target". This operation applies to all fields for all rows.
Replaces any values that match "form" with the value "target".
Replaces any values that match "form" with the value "target". This operation only applies to the field name specified.
For each row, the value corresponding to the given fieldName is applied to the function.
For each row, the value corresponding to the given fieldName is applied to the function. The result of the function is the new value for that cell.
(Since version 1.3.0) Use addFieldFn for better type inference
Returns a new DataStream with a new field added at the end.
Returns a new DataStream with a new field added at the end. The datatype for the field is assumed to be String. The value for the field is taken from the function which is invoked for each row.
(Since version 1.3.0) Use addFieldFn for better type inference
(Since version 1.3.0) use addFieldFn
(Since version 1.3.0) use addFieldFn
(Since version 1.3.0) use addField with errorIfFieldExists = false
(Since version 1.3.0) use addField with errorIfFieldExists = false
An implementation of DataStream for which items are emitted by calling publish. When no more items are to be published, call close() so that downstream subscribers can complete.
Subscribers to this publisher will block as normal, and so they should normally be placed into a separate thread.