org.opensearch.cluster.coordination.FollowersChecker

public class FollowersChecker extends Object

The FollowersChecker is responsible for allowing a leader to check that its followers are still connected and healthy. On deciding that a follower has failed the leader will remove it from the cluster. We are fairly lenient, possibly allowing multiple checks to fail before considering a follower to be faulty, to allow for a brief network partition or a long GC cycle to occur without triggering the removal of a node and the consequent shard reallocation.

Opensearch.internal:

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

FollowersChecker.FollowerCheckRequest

Request to check follower.
Field Summary

Fields

Modifier and Type

Field

Description

static final String

FOLLOWER_CHECK_ACTION_NAME

static final Setting<org.opensearch.common.unit.TimeValue>

FOLLOWER_CHECK_INTERVAL_SETTING

static final Setting<Integer>

FOLLOWER_CHECK_RETRY_COUNT_SETTING

static final Setting<org.opensearch.common.unit.TimeValue>

FOLLOWER_CHECK_TIMEOUT_SETTING
Constructor Summary

Constructors

Constructor

Description

FollowersChecker(Settings settings, ClusterSettings clusterSettings, TransportService transportService, Consumer<FollowersChecker.FollowerCheckRequest> handleRequestAndUpdateState, BiConsumer<DiscoveryNode,String> onNodeFailure, NodeHealthService nodeHealthService, ClusterManagerMetrics clusterManagerMetrics)
Method Summary

Modifier and Type

Method

Description

void

clearCurrentNodes()

Clear the set of known nodes, stopping all checks.

Set<DiscoveryNode>

getFaultyNodes()

void

setCurrentNodes(DiscoveryNodes discoveryNodes)

Update the set of known nodes, starting to check any new ones and stopping checking any previously-known-but-now-unknown ones.

String

toString()

void

updateFastResponseState(long term, Coordinator.Mode mode)

The system is normally in a state in which every follower remains a follower of a stable leader in a single term for an extended period of time, and therefore our response to every follower check is the same.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait

Field Details
- FOLLOWER_CHECK_ACTION_NAME
  
  public static final String FOLLOWER_CHECK_ACTION_NAME
  See Also:
  
  Constant Field Values
- FOLLOWER_CHECK_INTERVAL_SETTING
  
  public static final Setting<org.opensearch.common.unit.TimeValue> FOLLOWER_CHECK_INTERVAL_SETTING
- FOLLOWER_CHECK_TIMEOUT_SETTING
  
  public static final Setting<org.opensearch.common.unit.TimeValue> FOLLOWER_CHECK_TIMEOUT_SETTING
- FOLLOWER_CHECK_RETRY_COUNT_SETTING
  
  public static final Setting<Integer> FOLLOWER_CHECK_RETRY_COUNT_SETTING
Constructor Details
- FollowersChecker
  
  public FollowersChecker(Settings settings, ClusterSettings clusterSettings, TransportService transportService, Consumer<FollowersChecker.FollowerCheckRequest> handleRequestAndUpdateState, BiConsumer<DiscoveryNode,String> onNodeFailure, NodeHealthService nodeHealthService, ClusterManagerMetrics clusterManagerMetrics)
Method Details
- setCurrentNodes
  
  public void setCurrentNodes(DiscoveryNodes discoveryNodes)
  
  Update the set of known nodes, starting to check any new ones and stopping checking any previously-known-but-now-unknown ones.
- clearCurrentNodes
  
  public void clearCurrentNodes()
  
  Clear the set of known nodes, stopping all checks.
- updateFastResponseState
  
  public void updateFastResponseState(long term, Coordinator.Mode mode)
  
  The system is normally in a state in which every follower remains a follower of a stable leader in a single term for an extended period of time, and therefore our response to every follower check is the same. We handle this case with a single volatile read entirely on the network thread, and only if the fast path fails do we perform some work in the background, by notifying the FollowersChecker whenever our term or mode changes here.
- getFaultyNodes
  
  public Set<DiscoveryNode> getFaultyNodes()
  
  Returns:
  
  nodes in the current cluster state which have failed their follower checks.
- toString
  
  public String toString()
  
  Overrides:
  
  toString in class Object

Class FollowersChecker

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

FOLLOWER_CHECK_ACTION_NAME

FOLLOWER_CHECK_INTERVAL_SETTING

FOLLOWER_CHECK_TIMEOUT_SETTING

FOLLOWER_CHECK_RETRY_COUNT_SETTING

Constructor Details

FollowersChecker

Method Details

setCurrentNodes

clearCurrentNodes

updateFastResponseState

getFaultyNodes

toString