final classContiguousGroupBy[T, K] extends GraphStage[FlowShape[T, (K, Source[T, NotUsed])]]
A group-by specifically written for streams that are contiguous w.r.t. the computed key. Sorted-stream is a sufficient,
but not a necessary condition for this to work. Informally, what matters is that all elements that map to the same
key form groups of consecutive elements, i.e:
for-all s, t in inputstream, such that s precedes t directly: if keyFor(s) != keyFor(t), then:
- there exists no element u in inputstream such that t precedes u (at any distance) s.t. keyFor(s) == keyFor(u); and,
- there exists no element r in inputstream such that r precedes s (at any distance) s.t. keyFor(r) == keyFor(t).
All groups of stream-elements are propagated to their own substream. What distinguishes this implementation from
the typical group-by implementation is that sub-streams can be closed directly after witnessing a value that maps
to a different key. The traditional group-by must keep all sub-streams open until the superstream is consumed, as
only then it can guarantee that no more elements will be propagated to any of the sub-streams. Resources can be
freed immediately, given the guarantee that no more elements will arrive for the substream.
An contiguous approach (if the stream permits it) allows you to stay closer to business-semantics; There is no
need to specify how many groups may be processed concurrently. Also, we have more (but not complete) freedom to wait
in our superstream for completion of a substream, without introducing deadlocks. Both these permit us to use
a declarative group-by, where otherwise this logic would be obfuscated by stream-management.
A group-by specifically written for streams that are contiguous w.r.t. the computed key. Sorted-stream is a sufficient, but not a necessary condition for this to work. Informally, what matters is that all elements that map to the same key form groups of consecutive elements, i.e:
for-all s, t in inputstream, such that s precedes t directly: if
keyFor(s)
!=keyFor(t)
, then: - there exists no element u in inputstream such that t precedes u (at any distance) s.t.keyFor(s)
==keyFor(u)
; and, - there exists no element r in inputstream such that r precedes s (at any distance) s.t.keyFor(r)
==keyFor(t)
.All groups of stream-elements are propagated to their own substream. What distinguishes this implementation from the typical group-by implementation is that sub-streams can be closed directly after witnessing a value that maps to a different key. The traditional group-by must keep all sub-streams open until the superstream is consumed, as only then it can guarantee that no more elements will be propagated to any of the sub-streams. Resources can be freed immediately, given the guarantee that no more elements will arrive for the substream.
An contiguous approach (if the stream permits it) allows you to stay closer to business-semantics; There is no need to specify how many groups may be processed concurrently. Also, we have more (but not complete) freedom to wait in our superstream for completion of a substream, without introducing deadlocks. Both these permit us to use a declarative group-by, where otherwise this logic would be obfuscated by stream-management.