Interface FailureDetector
-
- All Known Implementing Classes:
DefaultFailureDetector
public interface FailureDetector
« start hereCluster nodes failure detector.Implementations of this interface are responsible for providing failure detection logic to the
ClusterService
. Typically this logic is based on heartbeat messages exchange between cluster nodes, however there is no hard restriction and any other algorithms can be used.Below are the key points of how this interface is used by the cluster service:
- When cluster service starts or if cluster service detects that there are changes in the cluster topology it calls
update(Set)
method so that failure detector could update its internal state of monitored nodes. - Once per
heartbeat interval
cluster service callsisAlive(ClusterAddress)
method to check if particular remote node is alive. Iffalse
is returned by this method then such node will be marked as suspected to be failed and this information will be shared with other cluster members. If node failure is suspected by thefailureQuorum()
amount of nodes then such node will be marked as failed and will be removed from the cluster. - Once per
heartbeat interval
cluster service callsheartbeatTick()
method. If this method returns a non-empty list of cluster node addresses then heartbeat request message will be sent to each of those nodes. - When cluster service receives a heartbeat request message from a remote node then it calls #
onHeartbeatRequest(ClusterAddress)
method. Iftrue
is returned by this method then heartbeat reply will be sent back to the originator node. Once cluster service of the originator node receives such a reply it callsonHeartbeatReply(ClusterAddress)
method.
Implementations of this interface can be registered via
ClusterServiceFactory.setFailureDetector(FailureDetector)
method.For the default implementation of this interface please see
DefaultFailureDetector
.
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description int
failureQuorum()
Return the amount of nodes that should agree on some particular node failure before removing such node from the cluster.long
heartbeatInterval()
Returns the time interval in milliseconds between heartbeat sending rounds (seeheartbeatTick()
).Collection<ClusterAddress>
heartbeatTick()
Runs a heartbeat tick and returns a set of cluster node addresses that should received a heartbeat request message.void
initialize(FailureDetectorContext context)
Initialized this failure detector with the runtime context.boolean
isAlive(ClusterAddress node)
Returnstrue
if cluster node at the specified address is known to be alive.void
onConnectFailure(ClusterAddress node)
Notifies this failure detector upon failure while trying to connect to a remote node.void
onHeartbeatReply(ClusterAddress node)
Notifies this failure detector on heartbeat reply message from a remote node.boolean
onHeartbeatRequest(ClusterAddress from)
Notifies this failure detector on heartbeat request message form a remote node.void
terminate()
Terminates this failure detector.void
update(Set<ClusterAddress> nodes)
Updates this failure detector with the latest information about all known cluster nodes addresses (including local node address).
-
-
-
Method Detail
-
initialize
void initialize(FailureDetectorContext context) throws HekateException
Initialized this failure detector with the runtime context.- Parameters:
context
- Context.- Throws:
HekateException
- If this failure detector couldn't be initialized.
-
heartbeatInterval
long heartbeatInterval()
Returns the time interval in milliseconds between heartbeat sending rounds (seeheartbeatTick()
).If the returned value if less than or equals to zero then health monitoring will be completely disabled and
heartbeatTick()
/isAlive(ClusterAddress)
methods will never be called.- Returns:
- Time interval in milliseconds between heartbeat sending rounds.
-
failureQuorum
int failureQuorum()
Return the amount of nodes that should agree on some particular node failure before removing such node from the cluster.The value of this parameter is expected to be greater than or equals to 1. If values is less then 1 then it will be automatically adjusted to 1.
- Returns:
- Amount of nodes that should agree on some particular node failure before removing such node from the cluster.
-
terminate
void terminate()
Terminates this failure detector.
-
isAlive
boolean isAlive(ClusterAddress node)
Returnstrue
if cluster node at the specified address is known to be alive. Returnsfalse
if node is considered to be failed.- Parameters:
node
- Node address.- Returns:
true
if node is alive orfalse
if node is considered to be failed.
-
update
void update(Set<ClusterAddress> nodes)
Updates this failure detector with the latest information about all known cluster nodes addresses (including local node address).Note that the specified addresses set can include nodes that just started joining and are not within cluster service's
topology
.- Parameters:
nodes
- Cluster node addresses.
-
heartbeatTick
Collection<ClusterAddress> heartbeatTick()
Runs a heartbeat tick and returns a set of cluster node addresses that should received a heartbeat request message.The time interval between heartbeat ticks is controlled by
heartbeatInterval()
method.- Returns:
- Set of cluster node addresses for heartbeat request message sending.
- See Also:
onHeartbeatRequest(ClusterAddress)
-
onHeartbeatRequest
boolean onHeartbeatRequest(ClusterAddress from)
Notifies this failure detector on heartbeat request message form a remote node. Returns a boolean flag indicating whether a heartbeat reply should be send (true
) or heartbeat replies are not supported (false
).- Parameters:
from
- Address of the heartbeat request sender node.- Returns:
true
if heartbeat reply should be send back to the requester.- See Also:
heartbeatTick()
,onHeartbeatReply(ClusterAddress)
-
onHeartbeatReply
void onHeartbeatReply(ClusterAddress node)
Notifies this failure detector on heartbeat reply message from a remote node.- Parameters:
node
- Address of heartbeat reply sender node.
-
onConnectFailure
void onConnectFailure(ClusterAddress node)
Notifies this failure detector upon failure while trying to connect to a remote node.- Parameters:
node
- Address of a failed node.
-
-