Class CheckpointFailureManager
- java.lang.Object
-
- org.apache.flink.runtime.checkpoint.CheckpointFailureManager
-
public class CheckpointFailureManager extends Object
The checkpoint failure manager which centralized manage checkpoint failure processing logic.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static interfaceCheckpointFailureManager.FailJobCallbackA callback interface about how to fail a job.
-
Field Summary
Fields Modifier and Type Field Description static StringEXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGEstatic intUNLIMITED_TOLERABLE_FAILURE_NUMBER
-
Constructor Summary
Constructors Constructor Description CheckpointFailureManager(int tolerableCpFailureNumber, CheckpointFailureManager.FailJobCallback failureCallback)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description voidcheckFailureCounter(CheckpointException exception, long checkpointId)voidhandleCheckpointException(PendingCheckpoint pendingCheckpoint, CheckpointProperties checkpointProperties, CheckpointException exception, ExecutionAttemptID executionAttemptID, org.apache.flink.api.common.JobID job, PendingCheckpointStats pendingCheckpointStats, CheckpointStatsTracker statsTracker)Failures on JM: all checkpoints - go against failure counter.voidhandleCheckpointSuccess(long checkpointId)Handle checkpoint success.
-
-
-
Field Detail
-
UNLIMITED_TOLERABLE_FAILURE_NUMBER
public static final int UNLIMITED_TOLERABLE_FAILURE_NUMBER
- See Also:
- Constant Field Values
-
EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE
public static final String EXCEEDED_CHECKPOINT_TOLERABLE_FAILURE_MESSAGE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
CheckpointFailureManager
public CheckpointFailureManager(int tolerableCpFailureNumber, CheckpointFailureManager.FailJobCallback failureCallback)
-
-
Method Detail
-
handleCheckpointException
public void handleCheckpointException(@Nullable PendingCheckpoint pendingCheckpoint, CheckpointProperties checkpointProperties, CheckpointException exception, @Nullable ExecutionAttemptID executionAttemptID, org.apache.flink.api.common.JobID job, @Nullable PendingCheckpointStats pendingCheckpointStats, CheckpointStatsTracker statsTracker)Failures on JM:- all checkpoints - go against failure counter.
- any savepoints - don’t do anything, manual action, the failover will not help anyway.
Failures on TM:
- all checkpoints - go against failure counter (failover might help and we want to notify users).
- sync savepoints - we must always fail, otherwise we risk deadlock when the job cancelation waiting for finishing savepoint which never happens.
- non sync savepoints - go against failure counter (failover might help solve the problem).
- Parameters:
pendingCheckpoint- the failed checkpoint if it was initialized already.checkpointProperties- the checkpoint properties in order to determinate which handle strategy can be used.exception- the checkpoint exception.executionAttemptID- the execution attempt id, as a safe guard.job- the JobID.pendingCheckpointStats- the pending checkpoint statistics.statsTracker- the tracker for checkpoint statistics.
-
checkFailureCounter
public void checkFailureCounter(CheckpointException exception, long checkpointId)
-
handleCheckpointSuccess
public void handleCheckpointSuccess(long checkpointId)
Handle checkpoint success.- Parameters:
checkpointId- the failed checkpoint id used to count the continuous failure number based on checkpoint id sequence.
-
-