Interface ExchangeManager


@ThreadSafe public interface ExchangeManager
Service provider interface for an external exchange

Used by the engine to exchange data at stage boundaries

External exchange is responsible for accepting partitioned data from multiple upstream tasks, grouping that data based on the partitionId (see ExchangeSink.add(int, Slice)) and allowing the data to be consumed a partition at a time by a set of downstream tasks.

To support failure recovery an external exchange implementation is also responsible for data deduplication in an event of a task retry or a speculative execution of a task (when two identical tasks are running at the same time). The deduplication must be done based on the sink identifier (see Exchange.addSink(int)). The implementation should assume that the data written for the same ExchangeSinkHandle by multiple sink instances (see Exchange.instantiateSink(ExchangeSinkHandle, int)) is identical and the data written by an arbitrary instance can be chosen to be delivered while the data written by other instances must be safely discarded

  • Method Details

    • createExchange

      Exchange createExchange(ExchangeContext context, int outputPartitionCount, boolean preserveOrderWithinPartition)
      Called by the coordinator to initiate an external exchange between a pair of stages
      Parameters:
      context - contains various information about the query and stage being executed
      outputPartitionCount - number of distinct partitions to be created (grouped) by the exchange. Values of the partitionId parameter of the ExchangeSink.add(int, Slice) method will be in the [0..outputPartitionCount) range
      preserveOrderWithinPartition - preserve order of records within a single partition written by a single writer. This property does not impose any specific order on the sub partitions of a single output partition written by multiple independent writers. The order is preserved only for the records written by a single writer. The reader will read sub partitions written by different writers in no specific order. This setting is useful when collecting ordered output from a single task that produces a single partition (for example a task that performs a global "order by" operation). May impact performance as it makes certain optimizations not possible.
      Returns:
      Exchange object to be used by the coordinator to interact with the external exchange
    • createSink

      Called by a worker to create an ExchangeSink for a specific sink instance.

      A new sink instance is created by the coordinator for every task attempt (see Exchange.instantiateSink(ExchangeSinkHandle, int))

      Parameters:
      handle - returned by Exchange.instantiateSink(ExchangeSinkHandle, int)
      Returns:
      ExchangeSink used by the engine to write data to an exchange
    • createSource

      ExchangeSource createSource()
      Called by a worker to create an ExchangeSource to read exchange data.
      Returns:
      ExchangeSource used by the engine to read data from an exchange
    • supportsConcurrentReadAndWrite

      boolean supportsConcurrentReadAndWrite()
      Provides information if Exchange implementation provided with this plugin supports concurrent reading and writing.
    • shutdown

      default void shutdown()
      Shutdown the exchange manager by releasing any held resources such as threads, sockets, etc. This method will only be called when no queries are using the exchange manager. After this method is called, no methods will be called on the exchange manager or any objects obtained from exchange manager.