Class PeerSelector

java.lang.Object
org.apache.nifi.remote.client.PeerSelector

public class PeerSelector extends Object
Service which maintains state around peer (NiFi node(s) in a remote instance (cluster or standalone)). There is an internal cache which stores identifying information about each node and the current workload of each in number of flowfiles being processed. Individual nodes can be penalized for an amount of time (see penalize(Peer, long)) to avoid sending/receiving data from them. Attempts are made to balance communications ("busier" nodes will TransferDirection.SEND more and TransferDirection.RECEIVE fewer flowfiles from this instance).
  • Field Details

    • logger

      private static final org.slf4j.Logger logger
    • PEER_CACHE_MILLIS

      private static final long PEER_CACHE_MILLIS
    • peerPersistence

      private final PeerPersistence peerPersistence
    • peerStatusProvider

      private final PeerStatusProvider peerStatusProvider
    • peerPenaltyExpirations

      private final ConcurrentMap<PeerDescription,Long> peerPenaltyExpirations
    • peerStatusCache

      private volatile PeerStatusCache peerStatusCache
    • eventReporter

      private EventReporter eventReporter
  • Constructor Details

    • PeerSelector

      public PeerSelector(PeerStatusProvider peerStatusProvider, PeerPersistence peerPersistence)
      Returns a peer selector with the provided collaborators.
      Parameters:
      peerStatusProvider - the service which retrieves peer state
      peerPersistence - the service which persists peer state
  • Method Details

    • restoreInitialPeerStatusCache

      private void restoreInitialPeerStatusCache()
      Populates the peer status cache from the peer persistence provider (e.g. the file system or persisted cluster state). If this fails, it will log a warning and continue, as it is not required for startup. If the cached protocol differs from the currently configured protocol, the cache will be cleared.
    • calculateNormalizedWeight

      private static double calculateNormalizedWeight(TransferDirection direction, long totalFlowFileCount, int flowFileCount, int peerCount)
      Returns the normalized weight for this ratio of peer flowfiles to total flowfiles and the given direction. The number will be a Double between 0 and 100 indicating the percent of all flowfiles the peer should send/receive. The transfer direction is from the perspective of this node to the peer (i.e. how many flowfiles should this node send to the peer, or how many flowfiles should this node receive from the peer).
      Parameters:
      direction - the transfer direction (SEND weights the destinations higher if they have fewer flowfiles, RECEIVE weights them higher if they have more)
      totalFlowFileCount - the total flowfile count in the remote instance (standalone or cluster)
      flowFileCount - the flowfile count for the given peer
      peerCount - the number of peers in the remote instance
      Returns:
      the normalized weight of this peer
    • sortMapByWeight

      private static LinkedHashMap<PeerStatus,Double> sortMapByWeight(Map<PeerStatus,Double> unsortedMap)
      Returns an ordered map of peers sorted in descending order by value (relative weight).
      Parameters:
      unsortedMap - the unordered map of peers to weights
      Returns:
      the sorted (desc) map (by value)
    • printDistributionStatistics

      private static void printDistributionStatistics(Map<PeerStatus,Double> sortedPeerWorkloads, TransferDirection direction)
      Prints the distribution of the peers to the logger.
      Parameters:
      sortedPeerWorkloads - the peers and relative weights
    • sumMapValues

      private static double sumMapValues(Map<PeerStatus,Double> peerWeightMap)
      Returns the total of all values in the map. This method is frequently used to calculate the total number of flowfiles in the instance from the respective peer flowfile counts or the total percentage from the relative weights.
      Parameters:
      peerWeightMap - the map of peers to flowfile counts or relative weights
      Returns:
      the total of the map values
    • clear

      public void clear()
      Resets all penalization states for the peers.
    • getNextPeerStatus

      public PeerStatus getNextPeerStatus(TransferDirection direction)
      Return status of a peer that will be used for the next communication. The peers with lower workloads will be selected with higher probability.
      Parameters:
      direction - the amount of workload is calculated based on transaction direction, for SEND, a peer with fewer flow files is preferred, for RECEIVE, a peer with more flow files is preferred
      Returns:
      a selected peer, if there is no available peer or all peers are penalized, then return null
    • isPenalized

      public boolean isPenalized(PeerStatus peerStatus)
      Returns true if this peer is currently penalized and should not send/receive flowfiles.
      Parameters:
      peerStatus - the peer status identifying the peer
      Returns:
      true if this peer is penalized
    • penalize

      public void penalize(Peer peer, long penalizationMillis)
      Updates internal state map to penalize a PeerStatus that points to the specified peer.
      Parameters:
      peer - the peer
      penalizationMillis - period of time to penalize a given peer (relative time, not absolute)
    • penalize

      public void penalize(PeerDescription peerDescription, long penalizationMillis)
      Updates internal state map to penalize a PeerStatus that points to the specified peer.
      Parameters:
      peerDescription - the peer description (identifies the peer)
      penalizationMillis - period of time to penalize a given peer (relative time, not absolute)
    • refresh

      public void refresh()
      Allows for external callers to trigger a refresh of the internal peer status cache. Performs the refresh if the cache has expired. If the cache is still valid, skips the refresh.
    • setEventReporter

      public void setEventReporter(EventReporter eventReporter)
      Sets the event reporter instance.
      Parameters:
      eventReporter - the event reporter
    • buildWeightedPeerMap

      LinkedHashMap<PeerStatus,Double> buildWeightedPeerMap(Set<PeerStatus> statuses, TransferDirection direction)
      Returns a map of peers prepared for flowfile transfer in the specified direction. Each peer is a key and the value is a weighted percentage of the total flowfiles in the remote instance. For example, in a cluster where the total number of flowfiles is 100, distributed across three nodes 20 in A, 30 in B, and 50 in C, the resulting map for SEND will be [A:40.0, B:35.0, C:25.0] (1 - .2 => .8 * 100 / (3-1)) => 40.0).
      Parameters:
      statuses - the set of all peers
      direction - the direction of transfer (SEND weights the destinations higher if they have more flowfiles, RECEIVE weights them higher if they have fewer)
      Returns:
      the ordered map of each peer to its relative weight
    • createDestinationMap

      private Map<PeerStatus,Double> createDestinationMap(Set<PeerStatus> peerStatuses, TransferDirection direction)
      Returns a map indexed by a peer to the normalized weight (number of flowfiles currently being processed by the peer as a percentage of the total). This is used to allocate flowfiles to the various peers as destinations.
      Parameters:
      peerStatuses - the set of peers, along with their current workload (number of flowfiles)
      direction - whether sending flowfiles to these peers or receiving them
      Returns:
      the map of weighted peers
    • fetchRemotePeerStatuses

      private Set<PeerStatus> fetchRemotePeerStatuses(Set<PeerDescription> peersToRequestClusterInfoFrom) throws IOException
      Returns a set of PeerStatus objects representing all remote peers for the provided PeerDescriptions. If a queried peer returns updated state on a peer which has already been captured, the new state is used.

      Example:

      3 node cluster with nodes A, B, C

      Node A knows about Node B and Node C, B about A and C, etc.

           Action                           |   Statuses
           query(A) -> B.status, C.status   |   Bs1, Cs1
           query(B) -> A.status, C.status   |   As1, Bs1, Cs2
           query(C) -> A.status, B.status   |   As2, Bs2, Cs2
       
      Parameters:
      peersToRequestClusterInfoFrom - the set of peers to query
      Returns:
      the complete set of statuses for each collection of peers
      Throws:
      IOException - if there is a problem fetching peer statuses
    • getAvailablePeerStatus

      private PeerStatus getAvailablePeerStatus(Map<PeerStatus,Double> orderedPeerStatuses)
      Returns the PeerStatus identifying the next peer to send/receive data. This uses random selection of peers, weighted by the relative desirability (i.e. for SEND, peers with more flowfiles are more likely to be selected, and for RECEIVE, peers with fewer flowfiles are more likely).
      Parameters:
      orderedPeerStatuses - the map of peers to relative weights, sorted in descending order by weight
      Returns:
      the peer to send/receive data
    • getCacheAge

      private long getCacheAge()
      Returns the cache age in milliseconds. If the cache is null or not set, returns -1.
      Returns:
      the cache age in millis
    • getLastFetchedQueryablePeers

      private Set<PeerStatus> getLastFetchedQueryablePeers()
      Returns the set of queryable peers (PeerStatus.isQueryForPeers()) most recently fetched.
      Returns:
      the set of queryable peers (empty set if the cache is null)
    • getPeerStatuses

      private Set<PeerStatus> getPeerStatuses()
      Returns the set of peer statuses. If the cache is null or empty, refreshes the cache first and then returns the new peer status set.
      Returns:
      the most recent peer statuses (empty set if the cache is null)
    • getPeersToQuery

      private Set<PeerDescription> getPeersToQuery() throws IOException
      Returns the set of PeerDescription objects uniquely identifying each NiFi node which should be queried for PeerStatus.
      Returns:
      the set of recently retrieved peers and the bootstrap peer
      Throws:
      IOException - if there is a problem retrieving the list of peers to query
    • isCacheExpired

      private boolean isCacheExpired(PeerStatusCache cache)
      Returns true if this cache has expired.
      Parameters:
      cache - the peer status cache
      Returns:
      true if the cache is expired
    • isPeerRefreshNeeded

      private boolean isPeerRefreshNeeded()
      Returns true if the internal collection of peers is empty or the refresh time has passed.
      Returns:
      true if the peer statuses should be refreshed
    • persistPeerStatuses

      private void persistPeerStatuses(PeerStatusCache peerStatusCache)
      Persists the provided cache instance (in memory and via the PeerPersistence (e.g. in cluster state or a local file)) for future retrieval.
      Parameters:
      peerStatusCache - the cache of current peer statuses to persist
    • refreshPeerStatusCache

      private void refreshPeerStatusCache()
      Refreshes the list of S2S peers that flowfiles can be sent to or received from. Uses the stateful cache to reduce network overhead.