Interface CoordinationService

  • All Superinterfaces:
    Service

    @DefaultServiceFactory(CoordinationServiceFactory.class)
    public interface CoordinationService
    extends Service
    « start hereMain entry point to distributed coordination API.

    Overview

    CoordinationService provides support for implementing different coordination protocols among a set of cluster members. Such protocols can perform application-specific rebalancing of a distributed data, node roles assignment or any other logic that requires a coordinated agreement among multiple cluster members based on the consistent cluster topology (i.e. when all members have the same consistent view of the cluster topology).

    Coordination process is triggered by the CoordinationService every time when cluster topology changes (i.e. when a new node joins or an existing node leaves the cluster). Upon such event one of the cluster members is selected to be the process coordinator and starts exchanging messages with all other participating nodes until coordination process is finished or is interrupted by a concurrent cluster event.

    Service Configuration

    CoordinationService can be registered and configured in HekateBootstrap with the help of CoordinationServiceFactory as shown in the example below:

    
    // Prepare service factory.
    CoordinationServiceFactory factory = new CoordinationServiceFactory()
        // Register coordination process.
        .withProcess(new CoordinationProcessConfig()
            // Process name.
            .withName("example.process")
            // Coordination handler.
            .withHandler(new ExampleHandler())
        );
    
    // Start node.
    Hekate hekate = new HekateBootstrap()
        .withService(factory)
        .join();
    
    // Access the service.
    CoordinationService coordination = hekate.coordination();
    
    Note: This example requires Spring Framework integration (see HekateSpringBootstrap).
    
    <beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:h="http://www.hekate.io/spring/hekate-core"
        xmlns="http://www.springframework.org/schema/beans"
        xsi:schemaLocation="http://www.springframework.org/schema/beans
            http://www.springframework.org/schema/beans/spring-beans.xsd
            http://www.hekate.io/spring/hekate-core
            http://www.hekate.io/spring/hekate-core.xsd">
    
        <h:node id="hekate">
            <!-- Coordination service. -->
            <h:coordination>
                <h:process name="example.process">
                    <h:handler>
                        <bean class="foo.bar.SomeCoordinationHandler"/>
                    </h:handler>
                </h:process>
            </h:coordination>
    
            <!-- ...other services... -->
        </h:node>
    </beans>
    
    Note: This example requires Spring Framework integration (see HekateSpringBootstrap).
    
    <beans xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://www.springframework.org/schema/beans"
        xsi:schemaLocation="http://www.springframework.org/schema/beans
            http://www.springframework.org/schema/beans/spring-beans.xsd">
    
        <bean id="hekate" class="io.hekate.spring.bean.HekateSpringBootstrap">
            <property name="services">
                <list>
                    <!-- Coordination service. -->
                    <bean class="io.hekate.coordinate.CoordinationServiceFactory">
                        <property name="processes">
                            <list>
                                <bean class="io.hekate.coordinate.CoordinationProcessConfig">
                                    <property name="name" value="example.process"/>
                                    <property name="handler">
                                        <bean class="foo.bar.SomeCoordinationHandler"/>
                                    </property>
                                </bean>
                            </list>
                        </property>
                    </bean>
    
                    <!-- ...other services... -->
                </list>
            </property>
        </bean>
    </beans>
    

    Coordination Handler

    Application-specific logic of a distributed coordination process must be encapsulated into an implementation of CoordinationHandler interface.

    When CoordinationService start a new coordination process, it selects one of cluster nodes to be the coordinator and calls its CoordinationHandler.coordinate(CoordinatorContext) method together with a coordination context object. This object provides information about the coordination participants and provides methods for sending/receiving coordination requests to/from them. All other nodes besides the coordinator will stay idle and will wait for requests from the coordinator.

    Once coordination is completed and each of coordination participants reaches its final state (according to an application logic) then coordinator must explicitly notify the CoordinationService by calling the CoordinatorContext.complete() method.

    Example

    The code example below shows how CoordinationHandler can be implemented in order to perform distributed coordination of cluster members. For the sake of brevity the coordination scenario is very simple and has the goal of executing some application-specific logic on each of the coordinated members with each member holding an exclusive local lock. The coordination protocol can be described as follows:

    1. Ask all members to acquire local lock and await for confirmation from each member
    2. Ask all members to execute their application-specific logic and await for confirmation from each member
    3. Ask all members to release their local locks

    
    public class ExampleHandler implements CoordinationHandler {
        private final ReentrantLock dataLock = new ReentrantLock();
    
        @Override
        public void prepare(CoordinationContext ctx) {
            if (ctx.isCoordinator()) {
                System.out.println("Prepared to coordinate " + ctx.members());
            } else {
                System.out.println("Prepared to await for coordination from " + ctx.coordinator());
            }
        }
    
        @Override
        public void coordinate(CoordinatorContext ctx) {
            System.out.println("Coordinating " + ctx.members());
    
            // Ask all members to acquire their local locks.
            ctx.broadcast("lock", lockResponses -> {
                // Got lock confirmations from all members.
                // Ask all members to execute some application specific logic while lock is held.
                ctx.broadcast("execute", executeResponses -> {
                    // All members finished executing their tasks.
                    // Ask all members to release their locks.
                    ctx.broadcast("unlock", unlockResponses -> {
                        // All locks released. Coordination completed.
                        System.out.println("Done coordinating.");
    
                        ctx.complete();
                    });
                });
            });
        }
    
        @Override
        public void process(CoordinationRequest request, CoordinationContext ctx) {
            String msg = request.get(String.class);
    
            switch (msg) {
                case "lock": {
                    if (!dataLock.isHeldByCurrentThread()) {
                        dataLock.lock();
                    }
    
                    break;
                }
                case "execute": {
                    assert dataLock.isHeldByCurrentThread();
    
                    // ...perform some actions while all members are holding their locks...
    
                    break;
                }
                case "unlock": {
                    if (dataLock.isHeldByCurrentThread()) {
                        dataLock.unlock();
                    }
    
                    break;
                }
                default: {
                    throw new IllegalArgumentException("Unsupported messages: " + msg);
                }
            }
    
            // Send confirmation back to the coordinator.
            request.reply("ok");
        }
    
        @Override
        public void terminate() {
            // Make sure that lock is released if local node gets stopped
            if (dataLock.isHeldByCurrentThread()) {
                dataLock.unlock();
            }
        }
    }
    

    Messaging

    CoordinationService provides support for asynchronous message exchange among the coordination participants. It can be done via the following methods:

    When some node receives a request (either from the coordinator or from some other node) then its CoordinationHandler.process(CoordinationRequest, CoordinationContext) method gets called. Implementations of this method must perform their application-specific logic based on the request payload and send back a response via the CoordinationRequest.reply(Object) method.

    All messages of a coordination process are guaranteed to be send and received with the same consistent cluster topology (i.e. both sender and receiver has exactly the same cluster topology view). If topology mismatch is detected between the sender and the receiver then CoordinationService will transparently send a retry response back to the sender so that it could retry sending later once its topology gets consistent with the receiver or cancel the coordination process and restart it with a more up to date cluster topology.

    Topology Changes

    If topology change happens while coordination process is still running then CoordinationService will try to cancel the current process via CoordinationHandler.cancel(CoordinationContext) method and will start a new coordination process. Implementations of the CoordinationHandler interface are required to stop all activities of the current coordination process as soon as possible.

    In order to perform early detection of a cancelled coordination process please consider using the CoordinationContext.isCancelled() method. If this method returns true then this context is not valid and should not be used any more.

    In order to simplify handling of concurrent coordination processes it is recommended for implementations of the CoordinationHandler interface to minimize state that should be held in each handler instance. If coordination logic requires some transitional state to be kept during the coordination process then please consider keeping it as an attachment object of CoordinationContext instance (see CoordinationContext.setAttachment(Object)/CoordinationContext.getAttachment()).

    Awaiting for Initial Coordination

    Sometimes it is required for applications to await for initial coordination process to complete before proceeding to their main tasks (f.e. if application needs to know which data partitions or roles were assigned to its node by some imaginary coordination process when node joined the cluster).

    This can be done by obtaining a future object via futureOf(String) method. This future object will be notified right after the coordination process gets executed for the first time and can be used to CompletableFuture.get() await} for its completion as in the example below:

    
    // Get coordination process (or wait up to 3 seconds for initial coordination to be completed).
    CoordinationProcess process = hekate.coordination()
        .futureOf("example.process")
        .get(AWAIT_TIMEOUT, TimeUnit.SECONDS);
    
    System.out.println("Coordination completed for " + process.name());
    

    Thread Management

    Each CoordinationHandler instance is bound to a single thread that is managed by the CoordinationService. All coordination and messaging callbacks get processed on that thread sequentially in order to simplify asynchronous operations handling and prevent concurrency issues.

    If particular CoordinationHandler's operation takes long time to complete then it is recommended to use a separate thread pool to offload such operations from the main coordination thread. Otherwise such operations will block subsequent notification from the CoordinationService and will negatively impact on the overall coordination performance.

    See Also:
    CoordinationServiceFactory