Package org.apache.nifi.remote.client
Class PeerSelector
java.lang.Object
org.apache.nifi.remote.client.PeerSelector
Service which maintains state around peer (NiFi node(s) in a remote instance (cluster or
standalone)). There is an internal cache which stores identifying information about each
node and the current workload of each in number of flowfiles being processed. Individual
nodes can be penalized for an amount of time (see
penalize(Peer, long)
) to avoid
sending/receiving data from them. Attempts are made to balance communications ("busier"
nodes will TransferDirection.SEND
more and TransferDirection.RECEIVE
fewer
flowfiles from this instance).-
Field Summary
Modifier and TypeFieldDescriptionprivate EventReporter
private static final org.slf4j.Logger
private static final long
private final ConcurrentMap
<PeerDescription, Long> private final PeerPersistence
private PeerStatusCache
private final PeerStatusProvider
-
Constructor Summary
ConstructorDescriptionPeerSelector
(PeerStatusProvider peerStatusProvider, PeerPersistence peerPersistence) Returns a peer selector with the provided collaborators. -
Method Summary
Modifier and TypeMethodDescription(package private) LinkedHashMap
<PeerStatus, Double> buildWeightedPeerMap
(Set<PeerStatus> statuses, TransferDirection direction) Returns a map of peers prepared for flowfile transfer in the specified direction.private static double
calculateNormalizedWeight
(TransferDirection direction, long totalFlowFileCount, int flowFileCount, int peerCount) Returns the normalized weight for this ratio of peer flowfiles to total flowfiles and the given direction.void
clear()
Resets all penalization states for the peers.private Map
<PeerStatus, Double> createDestinationMap
(Set<PeerStatus> peerStatuses, TransferDirection direction) Returns a map indexed by a peer to the normalized weight (number of flowfiles currently being processed by the peer as a percentage of the total).private Set
<PeerStatus> fetchRemotePeerStatuses
(Set<PeerDescription> peersToRequestClusterInfoFrom) Returns a set ofPeerStatus
objects representing all remote peers for the providedPeerDescription
s.private PeerStatus
getAvailablePeerStatus
(Map<PeerStatus, Double> orderedPeerStatuses) Returns thePeerStatus
identifying the next peer to send/receive data.private long
Returns the cache age in milliseconds.private Set
<PeerStatus> Returns the set of queryable peers (PeerStatus.isQueryForPeers()
) most recently fetched.getNextPeerStatus
(TransferDirection direction) Return status of a peer that will be used for the next communication.private Set
<PeerStatus> Returns the set of peer statuses.private Set
<PeerDescription> Returns the set ofPeerDescription
objects uniquely identifying each NiFi node which should be queried forPeerStatus
.private boolean
isCacheExpired
(PeerStatusCache cache) Returnstrue
if this cache has expired.private boolean
Returnstrue
if the internal collection of peers is empty or the refresh time has passed.boolean
isPenalized
(PeerStatus peerStatus) Returnstrue
if this peer is currently penalized and should not send/receive flowfiles.void
penalize
(PeerDescription peerDescription, long penalizationMillis) Updates internal state map to penalize a PeerStatus that points to the specified peer.void
Updates internal state map to penalize a PeerStatus that points to the specified peer.private void
persistPeerStatuses
(PeerStatusCache peerStatusCache) Persists the provided cache instance (in memory and via thePeerPersistence
(e.g.private static void
printDistributionStatistics
(Map<PeerStatus, Double> sortedPeerWorkloads, TransferDirection direction) Prints the distribution of the peers to the logger.void
refresh()
Allows for external callers to trigger a refresh of the internal peer status cache.private void
Refreshes the list of S2S peers that flowfiles can be sent to or received from.private void
Populates the peer status cache from the peer persistence provider (e.g.void
setEventReporter
(EventReporter eventReporter) Sets the event reporter instance.private static LinkedHashMap
<PeerStatus, Double> sortMapByWeight
(Map<PeerStatus, Double> unsortedMap) Returns an ordered map of peers sorted in descending order by value (relative weight).private static double
sumMapValues
(Map<PeerStatus, Double> peerWeightMap) Returns the total of all values in the map.
-
Field Details
-
logger
private static final org.slf4j.Logger logger -
PEER_CACHE_MILLIS
private static final long PEER_CACHE_MILLIS -
peerPersistence
-
peerStatusProvider
-
peerPenaltyExpirations
-
peerStatusCache
-
eventReporter
-
-
Constructor Details
-
PeerSelector
Returns a peer selector with the provided collaborators.- Parameters:
peerStatusProvider
- the service which retrieves peer statepeerPersistence
- the service which persists peer state
-
-
Method Details
-
restoreInitialPeerStatusCache
private void restoreInitialPeerStatusCache()Populates the peer status cache from the peer persistence provider (e.g. the file system or persisted cluster state). If this fails, it will log a warning and continue, as it is not required for startup. If the cached protocol differs from the currently configured protocol, the cache will be cleared. -
calculateNormalizedWeight
private static double calculateNormalizedWeight(TransferDirection direction, long totalFlowFileCount, int flowFileCount, int peerCount) Returns the normalized weight for this ratio of peer flowfiles to total flowfiles and the given direction. The number will be a Double between 0 and 100 indicating the percent of all flowfiles the peer should send/receive. The transfer direction is from the perspective of this node to the peer (i.e. how many flowfiles should this node send to the peer, or how many flowfiles should this node receive from the peer).- Parameters:
direction
- the transfer direction (SEND
weights the destinations higher if they have fewer flowfiles,RECEIVE
weights them higher if they have more)totalFlowFileCount
- the total flowfile count in the remote instance (standalone or cluster)flowFileCount
- the flowfile count for the given peerpeerCount
- the number of peers in the remote instance- Returns:
- the normalized weight of this peer
-
sortMapByWeight
Returns an ordered map of peers sorted in descending order by value (relative weight).- Parameters:
unsortedMap
- the unordered map of peers to weights- Returns:
- the sorted (desc) map (by value)
-
printDistributionStatistics
private static void printDistributionStatistics(Map<PeerStatus, Double> sortedPeerWorkloads, TransferDirection direction) Prints the distribution of the peers to the logger.- Parameters:
sortedPeerWorkloads
- the peers and relative weights
-
sumMapValues
Returns the total of all values in the map. This method is frequently used to calculate the total number of flowfiles in the instance from the respective peer flowfile counts or the total percentage from the relative weights.- Parameters:
peerWeightMap
- the map of peers to flowfile counts or relative weights- Returns:
- the total of the map values
-
clear
public void clear()Resets all penalization states for the peers. -
getNextPeerStatus
Return status of a peer that will be used for the next communication. The peers with lower workloads will be selected with higher probability.- Parameters:
direction
- the amount of workload is calculated based on transaction direction, for SEND, a peer with fewer flow files is preferred, for RECEIVE, a peer with more flow files is preferred- Returns:
- a selected peer, if there is no available peer or all peers are penalized, then return null
-
isPenalized
Returnstrue
if this peer is currently penalized and should not send/receive flowfiles.- Parameters:
peerStatus
- the peer status identifying the peer- Returns:
- true if this peer is penalized
-
penalize
Updates internal state map to penalize a PeerStatus that points to the specified peer.- Parameters:
peer
- the peerpenalizationMillis
- period of time to penalize a given peer (relative time, not absolute)
-
penalize
Updates internal state map to penalize a PeerStatus that points to the specified peer.- Parameters:
peerDescription
- the peer description (identifies the peer)penalizationMillis
- period of time to penalize a given peer (relative time, not absolute)
-
refresh
public void refresh()Allows for external callers to trigger a refresh of the internal peer status cache. Performs the refresh if the cache has expired. If the cache is still valid, skips the refresh. -
setEventReporter
Sets the event reporter instance.- Parameters:
eventReporter
- the event reporter
-
buildWeightedPeerMap
LinkedHashMap<PeerStatus,Double> buildWeightedPeerMap(Set<PeerStatus> statuses, TransferDirection direction) Returns a map of peers prepared for flowfile transfer in the specified direction. Each peer is a key and the value is a weighted percentage of the total flowfiles in the remote instance. For example, in a cluster where the total number of flowfiles is 100, distributed across three nodes 20 in A, 30 in B, and 50 in C, the resulting map forSEND
will be[A:40.0, B:35.0, C:25.0]
(1 - .2 => .8 * 100 / (3-1)) => 40.0).- Parameters:
statuses
- the set of all peersdirection
- the direction of transfer (SEND
weights the destinations higher if they have more flowfiles,RECEIVE
weights them higher if they have fewer)- Returns:
- the ordered map of each peer to its relative weight
-
createDestinationMap
private Map<PeerStatus,Double> createDestinationMap(Set<PeerStatus> peerStatuses, TransferDirection direction) Returns a map indexed by a peer to the normalized weight (number of flowfiles currently being processed by the peer as a percentage of the total). This is used to allocate flowfiles to the various peers as destinations.- Parameters:
peerStatuses
- the set of peers, along with their current workload (number of flowfiles)direction
- whether sending flowfiles to these peers or receiving them- Returns:
- the map of weighted peers
-
fetchRemotePeerStatuses
private Set<PeerStatus> fetchRemotePeerStatuses(Set<PeerDescription> peersToRequestClusterInfoFrom) throws IOException Returns a set ofPeerStatus
objects representing all remote peers for the providedPeerDescription
s. If a queried peer returns updated state on a peer which has already been captured, the new state is used.Example:
3 node cluster with nodes A, B, C
Node A knows about Node B and Node C, B about A and C, etc.
Action | Statuses query(A) -> B.status, C.status | Bs1, Cs1 query(B) -> A.status, C.status | As1, Bs1, Cs2 query(C) -> A.status, B.status | As2, Bs2, Cs2
- Parameters:
peersToRequestClusterInfoFrom
- the set of peers to query- Returns:
- the complete set of statuses for each collection of peers
- Throws:
IOException
- if there is a problem fetching peer statuses
-
getAvailablePeerStatus
Returns thePeerStatus
identifying the next peer to send/receive data. This uses random selection of peers, weighted by the relative desirability (i.e. forSEND
, peers with more flowfiles are more likely to be selected, and forRECEIVE
, peers with fewer flowfiles are more likely).- Parameters:
orderedPeerStatuses
- the map of peers to relative weights, sorted in descending order by weight- Returns:
- the peer to send/receive data
-
getCacheAge
private long getCacheAge()Returns the cache age in milliseconds. If the cache is null or not set, returns-1
.- Returns:
- the cache age in millis
-
getLastFetchedQueryablePeers
Returns the set of queryable peers (PeerStatus.isQueryForPeers()
) most recently fetched.- Returns:
- the set of queryable peers (empty set if the cache is
null
)
-
getPeerStatuses
Returns the set of peer statuses. If the cache isnull
or empty, refreshes the cache first and then returns the new peer status set.- Returns:
- the most recent peer statuses (empty set if the cache is
null
)
-
getPeersToQuery
Returns the set ofPeerDescription
objects uniquely identifying each NiFi node which should be queried forPeerStatus
.- Returns:
- the set of recently retrieved peers and the bootstrap peer
- Throws:
IOException
- if there is a problem retrieving the list of peers to query
-
isCacheExpired
Returnstrue
if this cache has expired.- Parameters:
cache
- the peer status cache- Returns:
- true if the cache is expired
-
isPeerRefreshNeeded
private boolean isPeerRefreshNeeded()Returnstrue
if the internal collection of peers is empty or the refresh time has passed.- Returns:
- true if the peer statuses should be refreshed
-
persistPeerStatuses
Persists the provided cache instance (in memory and via thePeerPersistence
(e.g. in cluster state or a local file)) for future retrieval.- Parameters:
peerStatusCache
- the cache of current peer statuses to persist
-
refreshPeerStatusCache
private void refreshPeerStatusCache()Refreshes the list of S2S peers that flowfiles can be sent to or received from. Uses the stateful cache to reduce network overhead.
-