Interface ExchangeManager
Used by the engine to exchange data at stage boundaries
External exchange is responsible for accepting partitioned data from multiple upstream
tasks, grouping that data based on the partitionId
(see ExchangeSink.add(int, Slice)
) and allowing the data to be consumed a
partition at a time by a set of downstream tasks.
To support failure recovery an external exchange implementation is also responsible
for data deduplication in an event of a task retry or a speculative execution of a task
(when two identical tasks are running at the same time). The deduplication must be done
based on the sink identifier (see Exchange.addSink(int)
). The implementation should
assume that the data written for the same ExchangeSinkHandle
by multiple sink
instances (see Exchange.instantiateSink(ExchangeSinkHandle, int)
) is identical
and the data written by an arbitrary instance can be chosen to be delivered while the
data written by other instances must be safely discarded
-
Method Summary
Modifier and TypeMethodDescriptioncreateExchange
(ExchangeContext context, int outputPartitionCount, boolean preserveOrderWithinPartition) Called by the coordinator to initiate an external exchange between a pair of stagescreateSink
(ExchangeSinkInstanceHandle handle) Called by a worker to create anExchangeSink
for a specific sink instance.Called by a worker to create anExchangeSource
to read exchange data.default void
shutdown()
Shutdown the exchange manager by releasing any held resources such as threads, sockets, etc.boolean
Provides information if Exchange implementation provided with this plugin supports concurrent reading and writing.
-
Method Details
-
createExchange
Exchange createExchange(ExchangeContext context, int outputPartitionCount, boolean preserveOrderWithinPartition) Called by the coordinator to initiate an external exchange between a pair of stages- Parameters:
context
- contains various information about the query and stage being executedoutputPartitionCount
- number of distinct partitions to be created (grouped) by the exchange. Values of thepartitionId
parameter of theExchangeSink.add(int, Slice)
method will be in the[0..outputPartitionCount)
rangepreserveOrderWithinPartition
- preserve order of records within a single partition written by a single writer. This property does not impose any specific order on the sub partitions of a single output partition written by multiple independent writers. The order is preserved only for the records written by a single writer. The reader will read sub partitions written by different writers in no specific order. This setting is useful when collecting ordered output from a single task that produces a single partition (for example a task that performs a global "order by" operation). May impact performance as it makes certain optimizations not possible.- Returns:
Exchange
object to be used by the coordinator to interact with the external exchange
-
createSink
Called by a worker to create anExchangeSink
for a specific sink instance.A new sink instance is created by the coordinator for every task attempt (see
Exchange.instantiateSink(ExchangeSinkHandle, int)
)- Parameters:
handle
- returned byExchange.instantiateSink(ExchangeSinkHandle, int)
- Returns:
ExchangeSink
used by the engine to write data to an exchange
-
createSource
ExchangeSource createSource()Called by a worker to create anExchangeSource
to read exchange data.- Returns:
ExchangeSource
used by the engine to read data from an exchange
-
supportsConcurrentReadAndWrite
boolean supportsConcurrentReadAndWrite()Provides information if Exchange implementation provided with this plugin supports concurrent reading and writing. -
shutdown
default void shutdown()Shutdown the exchange manager by releasing any held resources such as threads, sockets, etc. This method will only be called when no queries are using the exchange manager. After this method is called, no methods will be called on the exchange manager or any objects obtained from exchange manager.
-