Interface FailureDetector

  • All Known Implementing Classes:
    DefaultFailureDetector

    public interface FailureDetector
    « start hereCluster nodes failure detector.

    Implementations of this interface are responsible for providing failure detection logic to the ClusterService. Typically this logic is based on heartbeat messages exchange between cluster nodes, however there is no hard restriction and any other algorithms can be used.

    Below are the key points of how this interface is used by the cluster service:

    • When cluster service starts or if cluster service detects that there are changes in the cluster topology it calls update(Set) method so that failure detector could update its internal state of monitored nodes.
    • Once per heartbeat interval cluster service calls isAlive(ClusterAddress) method to check if particular remote node is alive. If false is returned by this method then such node will be marked as suspected to be failed and this information will be shared with other cluster members. If node failure is suspected by the failureQuorum() amount of nodes then such node will be marked as failed and will be removed from the cluster.
    • Once per heartbeat interval cluster service calls heartbeatTick() method. If this method returns a non-empty list of cluster node addresses then heartbeat request message will be sent to each of those nodes.
    • When cluster service receives a heartbeat request message from a remote node then it calls #onHeartbeatRequest(ClusterAddress) method. If true is returned by this method then heartbeat reply will be sent back to the originator node. Once cluster service of the originator node receives such a reply it calls onHeartbeatReply(ClusterAddress) method.

    Implementations of this interface can be registered via ClusterServiceFactory.setFailureDetector(FailureDetector) method.

    For the default implementation of this interface please see DefaultFailureDetector.

    See Also:
    DefaultFailureDetector, ClusterServiceFactory.setFailureDetector(FailureDetector)
    • Method Detail

      • heartbeatInterval

        long heartbeatInterval()
        Returns the time interval in milliseconds between heartbeat sending rounds (see heartbeatTick()).

        If the returned value if less than or equals to zero then health monitoring will be completely disabled and heartbeatTick()/isAlive(ClusterAddress) methods will never be called.

        Returns:
        Time interval in milliseconds between heartbeat sending rounds.
      • failureQuorum

        int failureQuorum()
        Return the amount of nodes that should agree on some particular node failure before removing such node from the cluster.

        The value of this parameter is expected to be greater than or equals to 1. If values is less then 1 then it will be automatically adjusted to 1.

        Returns:
        Amount of nodes that should agree on some particular node failure before removing such node from the cluster.
      • terminate

        void terminate()
        Terminates this failure detector.
      • isAlive

        boolean isAlive​(ClusterAddress node)
        Returns true if cluster node at the specified address is known to be alive. Returns false if node is considered to be failed.
        Parameters:
        node - Node address.
        Returns:
        true if node is alive or false if node is considered to be failed.
      • update

        void update​(Set<ClusterAddress> nodes)
        Updates this failure detector with the latest information about all known cluster nodes addresses (including local node address).

        Note that the specified addresses set can include nodes that just started joining and are not within cluster service's topology.

        Parameters:
        nodes - Cluster node addresses.
      • heartbeatTick

        Collection<ClusterAddress> heartbeatTick()
        Runs a heartbeat tick and returns a set of cluster node addresses that should received a heartbeat request message.

        The time interval between heartbeat ticks is controlled by heartbeatInterval() method.

        Returns:
        Set of cluster node addresses for heartbeat request message sending.
        See Also:
        onHeartbeatRequest(ClusterAddress)
      • onHeartbeatRequest

        boolean onHeartbeatRequest​(ClusterAddress from)
        Notifies this failure detector on heartbeat request message form a remote node. Returns a boolean flag indicating whether a heartbeat reply should be send (true) or heartbeat replies are not supported (false).
        Parameters:
        from - Address of the heartbeat request sender node.
        Returns:
        true if heartbeat reply should be send back to the requester.
        See Also:
        heartbeatTick(), onHeartbeatReply(ClusterAddress)
      • onHeartbeatReply

        void onHeartbeatReply​(ClusterAddress node)
        Notifies this failure detector on heartbeat reply message from a remote node.
        Parameters:
        node - Address of heartbeat reply sender node.
      • onConnectFailure

        void onConnectFailure​(ClusterAddress node)
        Notifies this failure detector upon failure while trying to connect to a remote node.
        Parameters:
        node - Address of a failed node.