A AllWindowedStream represents a data stream where the stream of elements is split into windows based on a org.apache.flink.streaming.api.windowing.assigners.WindowAssigner.
A AllWindowedStream represents a data stream where the stream of elements is split into windows based on a org.apache.flink.streaming.api.windowing.assigners.WindowAssigner. Window emission is triggered based on a Trigger.
If an Evictor is specified it will be used to evict elements from the window after evaluation was triggered by the Trigger but before the actual evaluation of the window. When using an evictor window performance will degrade significantly, since pre-aggregation of window results cannot be used.
Note that the AllWindowedStream() is purely and API construct, during runtime the AllWindowedStream() will be collapsed together with the operation over the window into one single operation.
The type of elements in the stream.
The type of Window that the org.apache.flink.streaming.api.windowing.assigners.WindowAssigner assigns the elements to.
CoGroupedStreams
represents two DataStreams that have been co-grouped.
CoGroupedStreams
represents two DataStreams that have been co-grouped.
A streaming co-group operation is evaluated over elements in a window.
To finalize the co-group operation you also need to specify a KeySelector for both the first and second input and a WindowAssigner
Note: Right now, the groups are being built in memory so you need to ensure that they don't get too big. Otherwise the JVM might crash.
Example:
val one: DataStream[(String, Int)] = ... val two: DataStream[(String, Int)] = ... val result = one.coGroup(two) .where(new MyFirstKeySelector()) .equalTo(new MyFirstKeySelector()) .window(TumblingEventTimeWindows.of(Time.of(5, TimeUnit.SECONDS))) .apply(new MyCoGroupFunction()) }
ConnectedStreams represents two connected streams of (possibly) different data types.
ConnectedStreams represents two connected streams of (possibly) different data types. Connected streams are useful for cases where operations on one stream directly affect the operations on the other stream, usually via shared state between the streams.
An example for the use of connected streams would be to apply rules that change over time onto another stream. One of the connected streams has the rules, the other stream the elements to apply the rules to. The operation on the connected stream maintains the current set of rules in the state. It may receive either a rule update and update the state or a data element and apply the rules in the state to the element.
The connected stream can be conceptually viewed as a union stream of an Either type, that holds either the first stream's type or the second stream's type.
This class provides simple utility methods for collecting a DataStream, effectively enriching it with the functionality encapsulated by DataStreamUtils.
This class provides simple utility methods for collecting a DataStream, effectively enriching it with the functionality encapsulated by DataStreamUtils.
This experimental class is relocated from flink-streaming-contrib.
JoinedStreams
represents two DataStreams that have been joined.
JoinedStreams
represents two DataStreams that have been joined.
A streaming join operation is evaluated over elements in a window.
To finalize the join operation you also need to specify a KeySelector for both the first and second input and a WindowAssigner
Note: Right now, the groups are being built in memory so you need to ensure that they don't get too big. Otherwise the JVM might crash.
Example:
val one: DataStream[(String, Int)] = ... val two: DataStream[(String, Int)] = ... val result = one.join(two) .where {t => ... } .equal {t => ... } .window(TumblingEventTimeWindows.of(Time.of(5, TimeUnit.SECONDS))) .apply(new MyJoinFunction()) }
An OutputTag is a typed and named tag to use for tagging side outputs of an operator.
An OutputTag is a typed and named tag to use for tagging side outputs of an operator.
Example:
val outputTag = OutputTag[String]("late-data")
the type of elements in the side-output stream.
The SplitStream represents an operator that has been split using an org.apache.flink.streaming.api.collector.selector.OutputSelector.
The SplitStream represents an operator that has been split using an org.apache.flink.streaming.api.collector.selector.OutputSelector. Named outputs can be selected using the SplitStream#select() function. To apply a transformation on the whole output simply call the appropriate method on this stream.
A WindowedStream represents a data stream where elements are grouped by key, and for each key, the stream of elements is split into windows based on a org.apache.flink.streaming.api.windowing.assigners.WindowAssigner.
A WindowedStream represents a data stream where elements are grouped by key, and for each key, the stream of elements is split into windows based on a org.apache.flink.streaming.api.windowing.assigners.WindowAssigner. Window emission is triggered based on a Trigger.
The windows are conceptually evaluated for each key individually, meaning windows can trigger at different points for each key.
If an org.apache.flink.streaming.api.windowing.evictors.Evictor is specified it will be used to evict elements from the window after evaluation was triggered by the Trigger but before the actual evaluation of the window. When using an evictor window performance will degrade significantly, since pre-aggregation of window results cannot be used.
Note that the WindowedStream is purely and API construct, during runtime the WindowedStream will be collapsed together with the KeyedStream and the operation over the window into one single operation.
The type of elements in the stream.
The type of the key by which elements are grouped.
The type of Window that the org.apache.flink.streaming.api.windowing.assigners.WindowAssigner assigns the elements to.
A helper class to apply AsyncFunction to a data stream.
A helper class to apply AsyncFunction to a data stream.
Example:
val input: DataStream[String] = ... val asyncFunction: (String, ResultFuture[String]) => Unit = ... AsyncDataStream.orderedWait(input, asyncFunction, timeout, TimeUnit.MILLISECONDS, 100)
acceptPartialFunctions extends the original DataStream with methods with unique names that delegate to core higher-order functions (e.g.
acceptPartialFunctions extends the original DataStream with methods with unique names
that delegate to core higher-order functions (e.g. map
) so that we can work around
the fact that overloaded methods taking functions as parameters can't accept partial
functions as well. This enables the possibility to directly apply pattern matching
to decompose inputs such as tuples, case classes and collections.
The following is a small example that showcases how this extensions would work on a Flink data stream:
object Main { import org.apache.flink.streaming.api.scala.extensions._ case class Point(x: Double, y: Double) def main(args: Array[String]): Unit = { val env = StreamExecutionEnvironment.getExecutionEnvironment val ds = env.fromElements(Point(1, 2), Point(3, 4), Point(5, 6)) ds.filterWith { case Point(x, _) => x > 1 }.reduceWith { case (Point(x1, y1), (Point(x2, y2))) => Point(x1 + y1, x2 + y2) }.mapWith { case Point(x, y) => (x, y) }.flatMapWith { case (x, y) => Seq('x' -> x, 'y' -> y) }.keyingBy { case (id, value) => id } } }
The extension consists of several implicit conversions over all the data stream representations
that could gain from this feature. To use this set of extensions methods the user has to
explicitly opt-in by importing
org.apache.flink.streaming.api.scala.extensions.acceptPartialFunctions
.
For more information and usage examples please consult the Apache Flink official documentation.