Class AbstractHaServices
- java.lang.Object
-
- org.apache.flink.runtime.highavailability.AbstractHaServices
-
- All Implemented Interfaces:
AutoCloseable,GloballyCleanableResource,ClientHighAvailabilityServices,HighAvailabilityServices
- Direct Known Subclasses:
ZooKeeperLeaderElectionHaServices
public abstract class AbstractHaServices extends Object implements HighAvailabilityServices
Abstract high availability services based on distributed system(e.g. Zookeeper, Kubernetes). It will help with creating all the leader election/retrieval services and the cleanup. Please return a proper leader name int the implementation ofgetLeaderPathForResourceManager(),getLeaderPathForDispatcher(),getLeaderPathForJobManager(org.apache.flink.api.common.JobID),getLeaderPathForRestServer(). The returned leader name is the ConfigMap name in Kubernetes and child path in Zookeeper.close()andcleanupAllData()should be implemented to destroy the resources.The abstract class is also responsible for determining which component service should be reused. For example,
jobResultStoreis created once and could be reused many times.
-
-
Field Summary
Fields Modifier and Type Field Description protected org.apache.flink.configuration.ConfigurationconfigurationThe runtime configuration.protected ExecutorioExecutorThe executor to run external IO operations on.protected org.slf4j.Loggerlogger-
Fields inherited from interface org.apache.flink.runtime.highavailability.HighAvailabilityServices
DEFAULT_JOB_ID, DEFAULT_LEADER_ID
-
-
Constructor Summary
Constructors Modifier Constructor Description protectedAbstractHaServices(org.apache.flink.configuration.Configuration config, LeaderElectionDriverFactory driverFactory, Executor ioExecutor, BlobStoreService blobStoreService, JobResultStore jobResultStore)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidcleanupAllData()Deletes all data stored by high availability services in external stores.voidclose()Closes the high availability services, releasing all resources.BlobStorecreateBlobStore()Creates the BLOB store in which BLOBs are stored in a highly-available fashion.protected abstract CheckpointRecoveryFactorycreateCheckpointRecoveryFactory()Create the checkpoint recovery factory for the job manager.protected abstract ExecutionPlanStorecreateExecutionPlanStore()Create the submitted execution plan store for the job manager.protected abstract LeaderRetrievalServicecreateLeaderRetrievalService(String leaderName)Create leader retrieval service with specified leaderName.CheckpointRecoveryFactorygetCheckpointRecoveryFactory()Gets the checkpoint recovery factory for the job manager.LeaderElectiongetClusterRestEndpointLeaderElection()Gets theLeaderElectionfor the cluster's rest endpoint.LeaderRetrievalServicegetClusterRestEndpointLeaderRetriever()Get the leader retriever for the cluster's rest endpoint.LeaderElectiongetDispatcherLeaderElection()Gets theLeaderElectionfor the cluster's dispatcher.LeaderRetrievalServicegetDispatcherLeaderRetriever()Gets the leader retriever for the dispatcher.ExecutionPlanStoregetExecutionPlanStore()Gets the submitted execution plan store for the job manager.LeaderElectiongetJobManagerLeaderElection(org.apache.flink.api.common.JobID jobID)Gets theLeaderElectionfor the job with the givenJobID.LeaderRetrievalServicegetJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID)Gets the leader retriever for the job JobMaster which is responsible for the given job.LeaderRetrievalServicegetJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID, String defaultJobManagerAddress)Gets the leader retriever for the job JobMaster which is responsible for the given job.JobResultStoregetJobResultStore()Gets the store that holds information about the state of finished jobs.protected abstract StringgetLeaderPathForDispatcher()Get the leader path for Dispatcher.protected abstract StringgetLeaderPathForJobManager(org.apache.flink.api.common.JobID jobID)Get the leader path for specific JobManager.protected abstract StringgetLeaderPathForResourceManager()Get the leader path for ResourceManager.protected abstract StringgetLeaderPathForRestServer()Get the leader path for RestServer.LeaderElectiongetResourceManagerLeaderElection()Gets theLeaderElectionfor the cluster's resource manager.LeaderRetrievalServicegetResourceManagerLeaderRetriever()Gets the leader retriever for the cluster's resource manager.CompletableFuture<Void>globalCleanupAsync(org.apache.flink.api.common.JobID jobID, Executor executor)globalCleanupAsyncis expected to be called from the main thread.protected abstract voidinternalCleanup()Clean up the meta data in the distributed system(e.g.protected abstract voidinternalCleanupJobData(org.apache.flink.api.common.JobID jobID)Clean up the meta data in the distributed system(e.g.protected abstract voidinternalClose()Closes the components which is used for external operations(e.g.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.flink.runtime.highavailability.HighAvailabilityServices
closeWithOptionalClean, getWebMonitorLeaderElection, getWebMonitorLeaderRetriever
-
-
-
-
Field Detail
-
logger
protected final org.slf4j.Logger logger
-
ioExecutor
protected final Executor ioExecutor
The executor to run external IO operations on.
-
configuration
protected final org.apache.flink.configuration.Configuration configuration
The runtime configuration.
-
-
Constructor Detail
-
AbstractHaServices
protected AbstractHaServices(org.apache.flink.configuration.Configuration config, LeaderElectionDriverFactory driverFactory, Executor ioExecutor, BlobStoreService blobStoreService, JobResultStore jobResultStore)
-
-
Method Detail
-
getResourceManagerLeaderRetriever
public LeaderRetrievalService getResourceManagerLeaderRetriever()
Description copied from interface:HighAvailabilityServicesGets the leader retriever for the cluster's resource manager.- Specified by:
getResourceManagerLeaderRetrieverin interfaceHighAvailabilityServices
-
getDispatcherLeaderRetriever
public LeaderRetrievalService getDispatcherLeaderRetriever()
Description copied from interface:HighAvailabilityServicesGets the leader retriever for the dispatcher. This leader retrieval service is not always accessible.- Specified by:
getDispatcherLeaderRetrieverin interfaceHighAvailabilityServices
-
getJobManagerLeaderRetriever
public LeaderRetrievalService getJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID)
Description copied from interface:HighAvailabilityServicesGets the leader retriever for the job JobMaster which is responsible for the given job.- Specified by:
getJobManagerLeaderRetrieverin interfaceHighAvailabilityServices- Parameters:
jobID- The identifier of the job.- Returns:
- Leader retrieval service to retrieve the job manager for the given job
-
getJobManagerLeaderRetriever
public LeaderRetrievalService getJobManagerLeaderRetriever(org.apache.flink.api.common.JobID jobID, String defaultJobManagerAddress)
Description copied from interface:HighAvailabilityServicesGets the leader retriever for the job JobMaster which is responsible for the given job.- Specified by:
getJobManagerLeaderRetrieverin interfaceHighAvailabilityServices- Parameters:
jobID- The identifier of the job.defaultJobManagerAddress- JobManager address which will be returned by a static leader retrieval service.- Returns:
- Leader retrieval service to retrieve the job manager for the given job
-
getClusterRestEndpointLeaderRetriever
public LeaderRetrievalService getClusterRestEndpointLeaderRetriever()
Description copied from interface:ClientHighAvailabilityServicesGet the leader retriever for the cluster's rest endpoint.- Specified by:
getClusterRestEndpointLeaderRetrieverin interfaceClientHighAvailabilityServices- Specified by:
getClusterRestEndpointLeaderRetrieverin interfaceHighAvailabilityServices- Returns:
- the leader retriever for cluster's rest endpoint.
-
getResourceManagerLeaderElection
public LeaderElection getResourceManagerLeaderElection()
Description copied from interface:HighAvailabilityServicesGets theLeaderElectionfor the cluster's resource manager.- Specified by:
getResourceManagerLeaderElectionin interfaceHighAvailabilityServices
-
getDispatcherLeaderElection
public LeaderElection getDispatcherLeaderElection()
Description copied from interface:HighAvailabilityServicesGets theLeaderElectionfor the cluster's dispatcher.- Specified by:
getDispatcherLeaderElectionin interfaceHighAvailabilityServices
-
getJobManagerLeaderElection
public LeaderElection getJobManagerLeaderElection(org.apache.flink.api.common.JobID jobID)
Description copied from interface:HighAvailabilityServicesGets theLeaderElectionfor the job with the givenJobID.- Specified by:
getJobManagerLeaderElectionin interfaceHighAvailabilityServices
-
getClusterRestEndpointLeaderElection
public LeaderElection getClusterRestEndpointLeaderElection()
Description copied from interface:HighAvailabilityServicesGets theLeaderElectionfor the cluster's rest endpoint.- Specified by:
getClusterRestEndpointLeaderElectionin interfaceHighAvailabilityServices
-
getCheckpointRecoveryFactory
public CheckpointRecoveryFactory getCheckpointRecoveryFactory() throws Exception
Description copied from interface:HighAvailabilityServicesGets the checkpoint recovery factory for the job manager.- Specified by:
getCheckpointRecoveryFactoryin interfaceHighAvailabilityServices- Returns:
- Checkpoint recovery factory
- Throws:
Exception
-
getExecutionPlanStore
public ExecutionPlanStore getExecutionPlanStore() throws Exception
Description copied from interface:HighAvailabilityServicesGets the submitted execution plan store for the job manager.- Specified by:
getExecutionPlanStorein interfaceHighAvailabilityServices- Returns:
- Submitted execution plan store
- Throws:
Exception- if the submitted execution plan store could not be created
-
getJobResultStore
public JobResultStore getJobResultStore() throws Exception
Description copied from interface:HighAvailabilityServicesGets the store that holds information about the state of finished jobs.- Specified by:
getJobResultStorein interfaceHighAvailabilityServices- Returns:
- Store of finished job results
- Throws:
Exception- if job result store could not be created
-
createBlobStore
public BlobStore createBlobStore()
Description copied from interface:HighAvailabilityServicesCreates the BLOB store in which BLOBs are stored in a highly-available fashion.- Specified by:
createBlobStorein interfaceHighAvailabilityServices- Returns:
- Blob store
-
close
public void close() throws ExceptionDescription copied from interface:HighAvailabilityServicesCloses the high availability services, releasing all resources.This method does not delete or clean up any data stored in external stores (file systems, ZooKeeper, etc). Another instance of the high availability services will be able to recover the job.
If an exception occurs during closing services, this method will attempt to continue closing other services and report exceptions only after all services have been attempted to be closed.
- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceHighAvailabilityServices- Throws:
Exception- Thrown, if an exception occurred while closing these services.
-
cleanupAllData
public void cleanupAllData() throws ExceptionDescription copied from interface:HighAvailabilityServicesDeletes all data stored by high availability services in external stores.After this method was called, any job or session that was managed by these high availability services will be unrecoverable.
If an exception occurs during cleanup, this method will attempt to continue the cleanup and report exceptions only after all cleanup steps have been attempted.
- Specified by:
cleanupAllDatain interfaceHighAvailabilityServices- Throws:
Exception- if an error occurred while cleaning up data stored by them.
-
globalCleanupAsync
public CompletableFuture<Void> globalCleanupAsync(org.apache.flink.api.common.JobID jobID, Executor executor)
Description copied from interface:GloballyCleanableResourceglobalCleanupAsyncis expected to be called from the main thread. Heavy IO tasks should be outsourced into the passedcleanupExecutor. Thread-safety must be ensured.- Specified by:
globalCleanupAsyncin interfaceGloballyCleanableResource- Specified by:
globalCleanupAsyncin interfaceHighAvailabilityServices- Parameters:
jobID- TheJobIDof the job for which the local data should be cleaned up.executor- The fallback executor for IO-heavy operations.- Returns:
- The cleanup result future.
-
createLeaderRetrievalService
protected abstract LeaderRetrievalService createLeaderRetrievalService(String leaderName)
Create leader retrieval service with specified leaderName.- Parameters:
leaderName- ConfigMap name in Kubernetes or child node path in Zookeeper.- Returns:
- Return LeaderRetrievalService using Zookeeper or Kubernetes.
-
createCheckpointRecoveryFactory
protected abstract CheckpointRecoveryFactory createCheckpointRecoveryFactory() throws Exception
Create the checkpoint recovery factory for the job manager.- Returns:
- Checkpoint recovery factory
- Throws:
Exception
-
createExecutionPlanStore
protected abstract ExecutionPlanStore createExecutionPlanStore() throws Exception
Create the submitted execution plan store for the job manager.- Returns:
- Submitted execution plan store
- Throws:
Exception- if the submitted execution plan store could not be created
-
internalClose
protected abstract void internalClose() throws ExceptionCloses the components which is used for external operations(e.g. Zookeeper Client, Kubernetes Client).- Throws:
Exception- if the close operation failed
-
internalCleanup
protected abstract void internalCleanup() throws ExceptionClean up the meta data in the distributed system(e.g. Zookeeper, Kubernetes ConfigMap).If an exception occurs during internal cleanup, we will continue the cleanup in
cleanupAllData()and report exceptions only after all cleanup steps have been attempted.- Throws:
Exception- when do the cleanup operation on external storage.
-
internalCleanupJobData
protected abstract void internalCleanupJobData(org.apache.flink.api.common.JobID jobID) throws ExceptionClean up the meta data in the distributed system(e.g. Zookeeper, Kubernetes ConfigMap) for the specified Job. Method implementations need to be thread-safe.- Parameters:
jobID- The identifier of the job to cleanup.- Throws:
Exception- when do the cleanup operation on external storage.
-
getLeaderPathForResourceManager
protected abstract String getLeaderPathForResourceManager()
Get the leader path for ResourceManager.- Returns:
- Return the ResourceManager leader name. It is ConfigMap name in Kubernetes or child node path in Zookeeper.
-
getLeaderPathForDispatcher
protected abstract String getLeaderPathForDispatcher()
Get the leader path for Dispatcher.- Returns:
- Return the Dispatcher leader name. It is ConfigMap name in Kubernetes or child node path in Zookeeper.
-
getLeaderPathForJobManager
protected abstract String getLeaderPathForJobManager(org.apache.flink.api.common.JobID jobID)
Get the leader path for specific JobManager.- Parameters:
jobID- job id- Returns:
- Return the JobManager leader name for specified job id. It is ConfigMap name in Kubernetes or child node path in Zookeeper.
-
getLeaderPathForRestServer
protected abstract String getLeaderPathForRestServer()
Get the leader path for RestServer.- Returns:
- Return the RestServer leader name. It is ConfigMap name in Kubernetes or child node path in Zookeeper.
-
-