Class SchedulerBase
- java.lang.Object
-
- org.apache.flink.runtime.scheduler.SchedulerBase
-
- All Implemented Interfaces:
AutoCloseable,CheckpointScheduling,GlobalFailureHandler,SchedulerNG,org.apache.flink.util.AutoCloseableAsync
- Direct Known Subclasses:
DefaultScheduler
public abstract class SchedulerBase extends Object implements SchedulerNG, CheckpointScheduling
Base class which can be used to implementSchedulerNG.
-
-
Field Summary
Fields Modifier and Type Field Description protected ExecutionVertexVersionerexecutionVertexVersionerprotected InputsLocationsRetrieverinputsLocationsRetrieverprotected org.apache.flink.api.common.JobInfojobInfoprotected JobManagerJobMetricGroupjobManagerJobMetricGroupprotected OperatorCoordinatorHandleroperatorCoordinatorHandlerprotected StateLocationRetrieverstateLocationRetriever
-
Constructor Summary
Constructors Constructor Description SchedulerBase(org.slf4j.Logger log, JobGraph jobGraph, Executor ioExecutor, org.apache.flink.configuration.Configuration jobMasterConfiguration, CheckpointsCleaner checkpointsCleaner, CheckpointRecoveryFactory checkpointRecoveryFactory, JobManagerJobMetricGroup jobManagerJobMetricGroup, ExecutionVertexVersioner executionVertexVersioner, long initializationTimestamp, org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor mainThreadExecutor, JobStatusListener jobStatusListener, ExecutionGraphFactory executionGraphFactory, VertexParallelismStore vertexParallelismStore, ExecutionPlanSchedulingContext executionPlanSchedulingContext)
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description voidacknowledgeCheckpoint(org.apache.flink.api.common.JobID jobID, ExecutionAttemptID executionAttemptID, long checkpointId, CheckpointMetrics checkpointMetrics, TaskStateSnapshot checkpointState)protected voidarchiveFromFailureHandlingResult(FailureHandlingResultSnapshot failureHandlingResult)protected voidarchiveGlobalFailure(Throwable failure, CompletableFuture<Map<String,String>> failureLabels)voidcancel()protected abstract voidcancelAllPendingSlotRequestsInternal()CompletableFuture<Void>closeAsync()static VertexParallelismStorecomputeVertexParallelismStore(Iterable<JobVertex> vertices)Compute theVertexParallelismStorefor all given vertices, which will set defaults and ensure that the returned store contains valid parallelisms.static VertexParallelismStorecomputeVertexParallelismStore(Iterable<JobVertex> vertices, Function<JobVertex,Integer> defaultMaxParallelismFunc)static VertexParallelismStorecomputeVertexParallelismStore(Iterable<JobVertex> vertices, Function<JobVertex,Integer> defaultMaxParallelismFunc, Function<Integer,Integer> normalizeParallelismFunc)Compute theVertexParallelismStorefor all given vertices, which will set defaults and ensure that the returned store contains valid parallelisms, with a custom function for default max parallelism calculation and a custom function for normalizing vertex parallelism.static VertexParallelismStorecomputeVertexParallelismStore(JobGraph jobGraph)Compute theVertexParallelismStorefor all vertices of a given job graph, which will set defaults and ensure that the returned store contains valid parallelisms.voiddeclineCheckpoint(DeclineCheckpoint decline)CompletableFuture<CoordinationResponse>deliverCoordinationRequestToCoordinator(OperatorID operator, CoordinationRequest request)Delivers a coordination request to theOperatorCoordinatorwith the givenOperatorIDand returns the coordinator's response.voiddeliverOperatorEventToCoordinator(ExecutionAttemptID taskExecutionId, OperatorID operatorId, OperatorEvent evt)Delivers the given OperatorEvent to theOperatorCoordinatorwith the givenOperatorID.protected voidfailJob(Throwable cause, long timestamp, CompletableFuture<Map<String,String>> failureLabels)static intgetDefaultMaxParallelism(int parallelism)static intgetDefaultMaxParallelism(JobVertex vertex)Get a default value to use for a given vertex's max parallelism if none was specified.Iterable<RootExceptionHistoryEntry>getExceptionHistory()ExecutionGraphgetExecutionGraph()ExecutionGraph is exposed to make it easier to rework tests to be based on the new scheduler.ExecutionJobVertexgetExecutionJobVertex(JobVertexID jobVertexId)ExecutionVertexgetExecutionVertex(ExecutionVertexID executionVertexId)protected JobGraphgetJobGraph()protected org.apache.flink.api.common.JobIDgetJobId()CompletableFuture<org.apache.flink.api.common.JobStatus>getJobTerminationFuture()protected org.apache.flink.runtime.concurrent.ComponentMainThreadExecutorgetMainThreadExecutor()protected MarkPartitionFinishedStrategygetMarkPartitionFinishedStrategy()protected abstract longgetNumberOfRescales()protected abstract longgetNumberOfRestarts()protected ResultPartitionAvailabilityCheckergetResultPartitionAvailabilityChecker()protected SchedulingTopologygetSchedulingTopology()voidnotifyEndOfData(ExecutionAttemptID executionAttemptID)Notifies that the task has reached the end of data.voidnotifyKvStateRegistered(org.apache.flink.api.common.JobID jobId, JobVertexID jobVertexId, KeyGroupRange keyGroupRange, String registrationName, org.apache.flink.queryablestate.KvStateID kvStateId, InetSocketAddress kvStateServerAddress)voidnotifyKvStateUnregistered(org.apache.flink.api.common.JobID jobId, JobVertexID jobVertexId, KeyGroupRange keyGroupRange, String registrationName)protected abstract voidonTaskFailed(Execution execution)protected abstract voidonTaskFinished(Execution execution, IOMetrics ioMetrics)static voidregisterJobMetrics(org.apache.flink.metrics.MetricGroup metrics, JobStatusProvider jobStatusProvider, org.apache.flink.metrics.Gauge<Long> numberOfRestarts, org.apache.flink.metrics.Gauge<Long> numberOfRescales, DeploymentStateTimeMetrics deploymentTimeMetrics, Consumer<JobStatusListener> jobStatusListenerRegistrar, long initializationTimestamp, org.apache.flink.configuration.MetricOptions.JobStatusMetricsSettings jobStatusMetricsSettings)voidreportCheckpointMetrics(org.apache.flink.api.common.JobID jobID, ExecutionAttemptID attemptId, long id, CheckpointMetrics metrics)voidreportInitializationMetrics(org.apache.flink.api.common.JobID jobId, ExecutionAttemptID executionAttemptId, SubTaskInitializationMetrics initializationMetrics)CheckpointStatsSnapshotrequestCheckpointStats()Returns the checkpoint statistics for a given job.ExecutionGraphInforequestJob()org.apache.flink.api.common.JobStatusrequestJobStatus()KvStateLocationrequestKvStateLocation(org.apache.flink.api.common.JobID jobId, String registrationName)SerializedInputSplitrequestNextInputSplit(JobVertexID vertexID, ExecutionAttemptID executionAttempt)ExecutionStaterequestPartitionState(IntermediateDataSetID intermediateResultId, ResultPartitionID resultPartitionId)protected voidresetForNewExecution(ExecutionVertexID executionVertexId)protected voidresetForNewExecutions(Collection<ExecutionVertexID> vertices)protected voidrestoreState(Set<ExecutionVertexID> vertices, boolean isGlobalRecovery)protected voidsetGlobalFailureCause(Throwable cause, long timestamp)voidstartCheckpointScheduler()Starts the periodic scheduling if possible.voidstartScheduling()protected abstract voidstartSchedulingInternal()voidstopCheckpointScheduler()Stops the periodic scheduling if possible.CompletableFuture<String>stopWithSavepoint(String targetDirectory, boolean terminate, org.apache.flink.core.execution.SavepointFormatType formatType)protected voidtransitionExecutionGraphState(org.apache.flink.api.common.JobStatus current, org.apache.flink.api.common.JobStatus newState)protected voidtransitionToRunning()CompletableFuture<CompletedCheckpoint>triggerCheckpoint(org.apache.flink.core.execution.CheckpointType checkpointType)CompletableFuture<String>triggerSavepoint(String targetDirectory, boolean cancelJob, org.apache.flink.core.execution.SavepointFormatType formatType)voidupdateAccumulators(AccumulatorSnapshot accumulatorSnapshot)booleanupdateTaskExecutionState(TaskExecutionStateTransition taskExecutionState)-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.flink.runtime.scheduler.GlobalFailureHandler
handleGlobalFailure
-
Methods inherited from interface org.apache.flink.runtime.scheduler.SchedulerNG
requestJobResourceRequirements, updateJobResourceRequirements, updateTaskExecutionState
-
-
-
-
Field Detail
-
jobInfo
protected final org.apache.flink.api.common.JobInfo jobInfo
-
stateLocationRetriever
protected final StateLocationRetriever stateLocationRetriever
-
inputsLocationsRetriever
protected final InputsLocationsRetriever inputsLocationsRetriever
-
jobManagerJobMetricGroup
protected final JobManagerJobMetricGroup jobManagerJobMetricGroup
-
executionVertexVersioner
protected final ExecutionVertexVersioner executionVertexVersioner
-
operatorCoordinatorHandler
protected final OperatorCoordinatorHandler operatorCoordinatorHandler
-
-
Constructor Detail
-
SchedulerBase
public SchedulerBase(org.slf4j.Logger log, JobGraph jobGraph, Executor ioExecutor, org.apache.flink.configuration.Configuration jobMasterConfiguration, CheckpointsCleaner checkpointsCleaner, CheckpointRecoveryFactory checkpointRecoveryFactory, JobManagerJobMetricGroup jobManagerJobMetricGroup, ExecutionVertexVersioner executionVertexVersioner, long initializationTimestamp, org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor mainThreadExecutor, JobStatusListener jobStatusListener, ExecutionGraphFactory executionGraphFactory, VertexParallelismStore vertexParallelismStore, ExecutionPlanSchedulingContext executionPlanSchedulingContext) throws Exception- Throws:
Exception
-
-
Method Detail
-
getDefaultMaxParallelism
public static int getDefaultMaxParallelism(JobVertex vertex)
Get a default value to use for a given vertex's max parallelism if none was specified.- Parameters:
vertex- the vertex to compute a default max parallelism for- Returns:
- the computed max parallelism
-
getDefaultMaxParallelism
public static int getDefaultMaxParallelism(int parallelism)
-
computeVertexParallelismStore
public static VertexParallelismStore computeVertexParallelismStore(Iterable<JobVertex> vertices, Function<JobVertex,Integer> defaultMaxParallelismFunc)
-
computeVertexParallelismStore
public static VertexParallelismStore computeVertexParallelismStore(Iterable<JobVertex> vertices, Function<JobVertex,Integer> defaultMaxParallelismFunc, Function<Integer,Integer> normalizeParallelismFunc)
Compute theVertexParallelismStorefor all given vertices, which will set defaults and ensure that the returned store contains valid parallelisms, with a custom function for default max parallelism calculation and a custom function for normalizing vertex parallelism.- Parameters:
vertices- the vertices to compute parallelism fordefaultMaxParallelismFunc- a function for computing a default max parallelism if none is specified on a given vertexnormalizeParallelismFunc- a function for normalizing vertex parallelism- Returns:
- the computed parallelism store
-
computeVertexParallelismStore
public static VertexParallelismStore computeVertexParallelismStore(Iterable<JobVertex> vertices)
Compute theVertexParallelismStorefor all given vertices, which will set defaults and ensure that the returned store contains valid parallelisms.- Parameters:
vertices- the vertices to compute parallelism for- Returns:
- the computed parallelism store
-
computeVertexParallelismStore
public static VertexParallelismStore computeVertexParallelismStore(JobGraph jobGraph)
Compute theVertexParallelismStorefor all vertices of a given job graph, which will set defaults and ensure that the returned store contains valid parallelisms.- Parameters:
jobGraph- the job graph to retrieve vertices from- Returns:
- the computed parallelism store
-
resetForNewExecutions
protected void resetForNewExecutions(Collection<ExecutionVertexID> vertices)
-
resetForNewExecution
protected void resetForNewExecution(ExecutionVertexID executionVertexId)
-
restoreState
protected void restoreState(Set<ExecutionVertexID> vertices, boolean isGlobalRecovery) throws Exception
- Throws:
Exception
-
setGlobalFailureCause
protected void setGlobalFailureCause(@Nullable Throwable cause, long timestamp)
-
getMainThreadExecutor
protected org.apache.flink.runtime.concurrent.ComponentMainThreadExecutor getMainThreadExecutor()
-
failJob
protected void failJob(Throwable cause, long timestamp, CompletableFuture<Map<String,String>> failureLabels)
-
getSchedulingTopology
protected final SchedulingTopology getSchedulingTopology()
-
getResultPartitionAvailabilityChecker
protected final ResultPartitionAvailabilityChecker getResultPartitionAvailabilityChecker()
-
transitionToRunning
protected final void transitionToRunning()
-
getExecutionVertex
public ExecutionVertex getExecutionVertex(ExecutionVertexID executionVertexId)
-
getExecutionJobVertex
public ExecutionJobVertex getExecutionJobVertex(JobVertexID jobVertexId)
-
getJobGraph
protected JobGraph getJobGraph()
-
getNumberOfRestarts
protected abstract long getNumberOfRestarts()
-
getNumberOfRescales
protected abstract long getNumberOfRescales()
-
getMarkPartitionFinishedStrategy
protected MarkPartitionFinishedStrategy getMarkPartitionFinishedStrategy()
-
cancelAllPendingSlotRequestsInternal
protected abstract void cancelAllPendingSlotRequestsInternal()
-
transitionExecutionGraphState
protected void transitionExecutionGraphState(org.apache.flink.api.common.JobStatus current, org.apache.flink.api.common.JobStatus newState)
-
getExecutionGraph
@VisibleForTesting public ExecutionGraph getExecutionGraph()
ExecutionGraph is exposed to make it easier to rework tests to be based on the new scheduler. ExecutionGraph is expected to be used only for state check. Yet at the moment, before all the actions are factored out from ExecutionGraph and its sub-components, some actions may still be performed directly on it.
-
startScheduling
public final void startScheduling()
- Specified by:
startSchedulingin interfaceSchedulerNG
-
registerJobMetrics
public static void registerJobMetrics(org.apache.flink.metrics.MetricGroup metrics, JobStatusProvider jobStatusProvider, org.apache.flink.metrics.Gauge<Long> numberOfRestarts, org.apache.flink.metrics.Gauge<Long> numberOfRescales, DeploymentStateTimeMetrics deploymentTimeMetrics, Consumer<JobStatusListener> jobStatusListenerRegistrar, long initializationTimestamp, org.apache.flink.configuration.MetricOptions.JobStatusMetricsSettings jobStatusMetricsSettings)
-
startSchedulingInternal
protected abstract void startSchedulingInternal()
-
closeAsync
public CompletableFuture<Void> closeAsync()
- Specified by:
closeAsyncin interfaceorg.apache.flink.util.AutoCloseableAsync
-
cancel
public void cancel()
- Specified by:
cancelin interfaceSchedulerNG
-
getJobTerminationFuture
public CompletableFuture<org.apache.flink.api.common.JobStatus> getJobTerminationFuture()
- Specified by:
getJobTerminationFuturein interfaceSchedulerNG
-
archiveGlobalFailure
protected final void archiveGlobalFailure(Throwable failure, CompletableFuture<Map<String,String>> failureLabels)
-
archiveFromFailureHandlingResult
protected final void archiveFromFailureHandlingResult(FailureHandlingResultSnapshot failureHandlingResult)
-
updateTaskExecutionState
public boolean updateTaskExecutionState(TaskExecutionStateTransition taskExecutionState)
- Specified by:
updateTaskExecutionStatein interfaceSchedulerNG
-
onTaskFailed
protected abstract void onTaskFailed(Execution execution)
-
requestNextInputSplit
public SerializedInputSplit requestNextInputSplit(JobVertexID vertexID, ExecutionAttemptID executionAttempt) throws IOException
- Specified by:
requestNextInputSplitin interfaceSchedulerNG- Throws:
IOException
-
requestPartitionState
public ExecutionState requestPartitionState(IntermediateDataSetID intermediateResultId, ResultPartitionID resultPartitionId) throws PartitionProducerDisposedException
- Specified by:
requestPartitionStatein interfaceSchedulerNG- Throws:
PartitionProducerDisposedException
-
getExceptionHistory
@VisibleForTesting public Iterable<RootExceptionHistoryEntry> getExceptionHistory()
-
requestJob
public ExecutionGraphInfo requestJob()
- Specified by:
requestJobin interfaceSchedulerNG
-
requestCheckpointStats
public CheckpointStatsSnapshot requestCheckpointStats()
Description copied from interface:SchedulerNGReturns the checkpoint statistics for a given job. Although theCheckpointStatsSnapshotis included in theExecutionGraphInfo, this method is preferred toSchedulerNG.requestJob()because it is less expensive.- Specified by:
requestCheckpointStatsin interfaceSchedulerNG- Returns:
- checkpoint statistics snapshot for job graph
-
requestJobStatus
public org.apache.flink.api.common.JobStatus requestJobStatus()
- Specified by:
requestJobStatusin interfaceSchedulerNG
-
requestKvStateLocation
public KvStateLocation requestKvStateLocation(org.apache.flink.api.common.JobID jobId, String registrationName) throws UnknownKvStateLocation, FlinkJobNotFoundException
- Specified by:
requestKvStateLocationin interfaceSchedulerNG- Throws:
UnknownKvStateLocationFlinkJobNotFoundException
-
notifyKvStateRegistered
public void notifyKvStateRegistered(org.apache.flink.api.common.JobID jobId, JobVertexID jobVertexId, KeyGroupRange keyGroupRange, String registrationName, org.apache.flink.queryablestate.KvStateID kvStateId, InetSocketAddress kvStateServerAddress) throws FlinkJobNotFoundException- Specified by:
notifyKvStateRegisteredin interfaceSchedulerNG- Throws:
FlinkJobNotFoundException
-
notifyKvStateUnregistered
public void notifyKvStateUnregistered(org.apache.flink.api.common.JobID jobId, JobVertexID jobVertexId, KeyGroupRange keyGroupRange, String registrationName) throws FlinkJobNotFoundException- Specified by:
notifyKvStateUnregisteredin interfaceSchedulerNG- Throws:
FlinkJobNotFoundException
-
updateAccumulators
public void updateAccumulators(AccumulatorSnapshot accumulatorSnapshot)
- Specified by:
updateAccumulatorsin interfaceSchedulerNG
-
triggerSavepoint
public CompletableFuture<String> triggerSavepoint(String targetDirectory, boolean cancelJob, org.apache.flink.core.execution.SavepointFormatType formatType)
- Specified by:
triggerSavepointin interfaceSchedulerNG
-
triggerCheckpoint
public CompletableFuture<CompletedCheckpoint> triggerCheckpoint(org.apache.flink.core.execution.CheckpointType checkpointType)
- Specified by:
triggerCheckpointin interfaceSchedulerNG
-
stopCheckpointScheduler
public void stopCheckpointScheduler()
Description copied from interface:CheckpointSchedulingStops the periodic scheduling if possible.- Specified by:
stopCheckpointSchedulerin interfaceCheckpointScheduling
-
startCheckpointScheduler
public void startCheckpointScheduler()
Description copied from interface:CheckpointSchedulingStarts the periodic scheduling if possible.- Specified by:
startCheckpointSchedulerin interfaceCheckpointScheduling
-
acknowledgeCheckpoint
public void acknowledgeCheckpoint(org.apache.flink.api.common.JobID jobID, ExecutionAttemptID executionAttemptID, long checkpointId, CheckpointMetrics checkpointMetrics, TaskStateSnapshot checkpointState)- Specified by:
acknowledgeCheckpointin interfaceSchedulerNG
-
declineCheckpoint
public void declineCheckpoint(DeclineCheckpoint decline)
- Specified by:
declineCheckpointin interfaceSchedulerNG
-
reportCheckpointMetrics
public void reportCheckpointMetrics(org.apache.flink.api.common.JobID jobID, ExecutionAttemptID attemptId, long id, CheckpointMetrics metrics)- Specified by:
reportCheckpointMetricsin interfaceSchedulerNG
-
reportInitializationMetrics
public void reportInitializationMetrics(org.apache.flink.api.common.JobID jobId, ExecutionAttemptID executionAttemptId, SubTaskInitializationMetrics initializationMetrics)- Specified by:
reportInitializationMetricsin interfaceSchedulerNG
-
stopWithSavepoint
public CompletableFuture<String> stopWithSavepoint(@Nullable String targetDirectory, boolean terminate, org.apache.flink.core.execution.SavepointFormatType formatType)
- Specified by:
stopWithSavepointin interfaceSchedulerNG
-
deliverOperatorEventToCoordinator
public void deliverOperatorEventToCoordinator(ExecutionAttemptID taskExecutionId, OperatorID operatorId, OperatorEvent evt) throws org.apache.flink.util.FlinkException
Description copied from interface:SchedulerNGDelivers the given OperatorEvent to theOperatorCoordinatorwith the givenOperatorID.Failure semantics: If the task manager sends an event for a non-running task or a non-existing operator coordinator, then respond with an exception to the call. If task and coordinator exist, then we assume that the call from the TaskManager was valid, and any bubbling exception needs to cause a job failure
- Specified by:
deliverOperatorEventToCoordinatorin interfaceSchedulerNG- Throws:
org.apache.flink.util.FlinkException- Thrown, if the task is not running or no operator/coordinator exists for the given ID.
-
deliverCoordinationRequestToCoordinator
public CompletableFuture<CoordinationResponse> deliverCoordinationRequestToCoordinator(OperatorID operator, CoordinationRequest request) throws org.apache.flink.util.FlinkException
Description copied from interface:SchedulerNGDelivers a coordination request to theOperatorCoordinatorwith the givenOperatorIDand returns the coordinator's response.- Specified by:
deliverCoordinationRequestToCoordinatorin interfaceSchedulerNG- Returns:
- A future containing the response.
- Throws:
org.apache.flink.util.FlinkException- Thrown, if the task is not running, or no operator/coordinator exists for the given ID, or the coordinator cannot handle client events.
-
notifyEndOfData
public void notifyEndOfData(ExecutionAttemptID executionAttemptID)
Description copied from interface:SchedulerNGNotifies that the task has reached the end of data.- Specified by:
notifyEndOfDatain interfaceSchedulerNG- Parameters:
executionAttemptID- The execution attempt id.
-
getJobId
@VisibleForTesting protected org.apache.flink.api.common.JobID getJobId()
-
-