public abstract class AbstractS3ACommitter
extends org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
| Modifier | Constructor and Description |
|---|---|
protected |
AbstractS3ACommitter(org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Create a committer.
|
| Modifier and Type | Method and Description |
|---|---|
void |
abortJob(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.mapreduce.JobStatus.State state) |
protected void |
abortJobInternal(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions)
The internal job abort operation; can be overridden in tests.
|
protected void |
abortPendingUploads(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending,
boolean suppressExceptions)
Abort all pending uploads in the list.
|
protected void |
abortPendingUploadsInCleanup(boolean suppressExceptions)
Abort all pending uploads to the destination directory during
job cleanup operations.
|
protected ExecutorService |
buildThreadPool(org.apache.hadoop.mapreduce.JobContext context)
Returns an
ExecutorService for parallel tasks. |
protected void |
cleanup(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions)
Cleanup the job context, including aborting anything pending.
|
void |
cleanupJob(org.apache.hadoop.mapreduce.JobContext context) |
abstract void |
cleanupStagingDirs()
Clean up any staging directories.
|
void |
commitJob(org.apache.hadoop.mapreduce.JobContext context)
Commit work.
|
protected void |
commitJobInternal(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
Internal Job commit operation: where the S3 requests are made
(potentially in parallel).
|
protected void |
commitPendingUploads(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
Commit a list of pending uploads.
|
protected void |
deleteTaskAttemptPathQuietly(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Delete the task attempt path without raising any errors.
|
protected abstract org.apache.hadoop.fs.Path |
getBaseTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Compute the base path where the output of a task attempt is written.
|
protected CommitOperations |
getCommitOperations()
Get the commit actions instance.
|
org.apache.hadoop.conf.Configuration |
getConf() |
org.apache.hadoop.fs.FileSystem |
getDestFS()
Get the destination FS, creating it on demand if needed.
|
protected org.apache.hadoop.fs.FileSystem |
getDestinationFS(org.apache.hadoop.fs.Path out,
org.apache.hadoop.conf.Configuration config)
Get the destination filesystem from the output path and the configuration.
|
S3AFileSystem |
getDestS3AFS()
Get the destination as an S3A Filesystem; casting it.
|
protected abstract org.apache.hadoop.fs.Path |
getJobAttemptPath(int appAttemptId)
Compute the path where the output of a given job attempt will be placed.
|
org.apache.hadoop.fs.Path |
getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context)
Compute the path where the output of a given job attempt will be placed.
|
org.apache.hadoop.mapreduce.JobContext |
getJobContext()
Get the job/task context this committer was instantiated with.
|
abstract String |
getName()
Get the name of this committer.
|
org.apache.hadoop.fs.Path |
getOutputPath()
Final path of output, in the destination FS.
|
protected String |
getRole()
Used in logging and reporting to help disentangle messages.
|
protected org.apache.hadoop.fs.FileSystem |
getTaskAttemptFilesystem(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Get the task attempt path filesystem.
|
org.apache.hadoop.fs.Path |
getTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Compute the path where the output of a task attempt is stored until
that task is committed.
|
abstract org.apache.hadoop.fs.Path |
getTempTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Get a temporary directory for data.
|
org.apache.hadoop.fs.Path |
getWorkPath()
This is the critical method for
FileOutputFormat; it declares
the path for work. |
protected void |
initOutput(org.apache.hadoop.fs.Path out)
Init the output filesystem and path.
|
protected void |
jobCompleted(boolean success)
Job completion outcome; this may be subclassed in tests.
|
protected abstract List<SinglePendingCommit> |
listPendingUploadsToCommit(org.apache.hadoop.mapreduce.JobContext context)
Get the list of pending uploads for this job attempt.
|
protected List<SinglePendingCommit> |
loadPendingsetFiles(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions,
org.apache.hadoop.fs.FileSystem fs,
Iterable<? extends org.apache.hadoop.fs.FileStatus> pendingCommitFiles)
Try to read every pendingset file and build a list of them/
In the case of a failure to read the file, exceptions are held until all
reads have been attempted.
|
protected void |
maybeCreateSuccessMarker(org.apache.hadoop.mapreduce.JobContext context,
List<String> filenames)
if the job requires a success marker on a successful job,
create the file
CommitConstants._SUCCESS. |
protected void |
maybeCreateSuccessMarkerFromCommits(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
if the job requires a success marker on a successful job,
create the file
CommitConstants._SUCCESS. |
protected void |
maybeIgnore(boolean suppress,
String action,
Invoker.VoidOperation operation)
Execute an operation; maybe suppress any raised IOException.
|
protected void |
maybeIgnore(boolean suppress,
String action,
IOException ex)
Log or rethrow a caught IOException.
|
protected void |
preCommitJob(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
Subclass-specific pre commit actions.
|
void |
recoverTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
Task recovery considered unsupported: Warn and fail.
|
protected boolean |
requiresDelayedCommitOutputInFileSystem()
Flag to indicate whether or not the destination filesystem needs
to be configured to support magic paths where the output isn't immediately
visible.
|
protected void |
setConf(org.apache.hadoop.conf.Configuration conf) |
protected void |
setDestFS(org.apache.hadoop.fs.FileSystem destFS)
Set the destination FS: the FS of the final output.
|
protected void |
setOutputPath(org.apache.hadoop.fs.Path outputPath)
Set the output path.
|
void |
setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Base job setup deletes the success marker.
|
protected void |
setWorkPath(org.apache.hadoop.fs.Path workPath)
Set the work path for this committer.
|
String |
toString() |
hasOutputPathprotected AbstractS3ACommitter(org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
outputPath - the job's output path: MUST NOT be null.context - the task's contextIOException - on a failureprotected void initOutput(org.apache.hadoop.fs.Path out)
throws IOException
out - output pathIOException - failure to create the FS.public final org.apache.hadoop.mapreduce.JobContext getJobContext()
public final org.apache.hadoop.fs.Path getOutputPath()
getOutputPath in class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitterprotected final void setOutputPath(org.apache.hadoop.fs.Path outputPath)
outputPath - new valuepublic org.apache.hadoop.fs.Path getWorkPath()
FileOutputFormat; it declares
the path for work.getWorkPath in class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitterprotected void setWorkPath(org.apache.hadoop.fs.Path workPath)
workPath - the work path to use.public org.apache.hadoop.conf.Configuration getConf()
protected void setConf(org.apache.hadoop.conf.Configuration conf)
public org.apache.hadoop.fs.FileSystem getDestFS()
throws IOException
IOException - if the FS cannot be instantiated.public S3AFileSystem getDestS3AFS() throws IOException
IOException - if the FS cannot be instantiated.protected void setDestFS(org.apache.hadoop.fs.FileSystem destFS)
destFS - destination FS.public org.apache.hadoop.fs.Path getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context)
context - the context of the job. This is used to get the
application attempt ID.protected abstract org.apache.hadoop.fs.Path getJobAttemptPath(int appAttemptId)
appAttemptId - the ID of the application attempt for this job.public org.apache.hadoop.fs.Path getTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
getBaseTaskAttemptPath(TaskAttemptContext);
subclasses may return different values.context - the context of the task attempt.protected abstract org.apache.hadoop.fs.Path getBaseTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
context - the context of the task attempt.public abstract org.apache.hadoop.fs.Path getTempTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
context - task contextpublic abstract String getName()
public String toString()
toString in class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitterprotected org.apache.hadoop.fs.FileSystem getDestinationFS(org.apache.hadoop.fs.Path out,
org.apache.hadoop.conf.Configuration config)
throws IOException
out - output pathconfig - job/task configPathCommitException - output path isn't to an S3A FS instance.IOException - failure to instantiate the FS.protected boolean requiresDelayedCommitOutputInFileSystem()
public void recoverTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
throws IOException
recoverTask in class org.apache.hadoop.mapreduce.OutputCommittertaskContext - Context of the task whose output is being recoveredIOException - always.protected void maybeCreateSuccessMarkerFromCommits(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
throws IOException
CommitConstants._SUCCESS.
While the classic committers create a 0-byte file, the S3Guard committers
PUT up a the contents of a SuccessData file.context - job contextpending - the pending commitsIOException - IO failureprotected void maybeCreateSuccessMarker(org.apache.hadoop.mapreduce.JobContext context,
List<String> filenames)
throws IOException
CommitConstants._SUCCESS.
While the classic committers create a 0-byte file, the S3Guard committers
PUT up a the contents of a SuccessData file.context - job contextfilenames - list of filenames.IOException - IO failurepublic void setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
setupTask in class org.apache.hadoop.mapreduce.OutputCommittercontext - contextIOException - IO failureprotected org.apache.hadoop.fs.FileSystem getTaskAttemptFilesystem(org.apache.hadoop.mapreduce.TaskAttemptContext context)
throws IOException
context - task attemptIOException - failure to instantiateprotected void commitPendingUploads(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
throws IOException
context - job contextpending - list of pending uploadsIOException - on any failureprotected List<SinglePendingCommit> loadPendingsetFiles(org.apache.hadoop.mapreduce.JobContext context, boolean suppressExceptions, org.apache.hadoop.fs.FileSystem fs, Iterable<? extends org.apache.hadoop.fs.FileStatus> pendingCommitFiles) throws IOException
context - job contextsuppressExceptions - whether to suppress exceptions.fs - job attempt fspendingCommitFiles - list of files found in the listing scanIOException - on a failure when suppressExceptions is false.protected void commitJobInternal(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
throws IOException
context - job contextpending - pending requestIOException - any failurepublic void abortJob(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.mapreduce.JobStatus.State state)
throws IOException
abortJob in class org.apache.hadoop.mapreduce.OutputCommitterIOExceptionprotected void abortJobInternal(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions)
throws IOException
abortJob(JobContext, JobStatus.State) call.
The base implementation calls cleanup(JobContext, boolean)context - job contextsuppressExceptions - should exceptions be suppressed?IOException - any IO problem raised when suppressExceptions is false.protected void abortPendingUploadsInCleanup(boolean suppressExceptions)
throws IOException
suppressExceptions - should exceptions be suppressedIOException - IO problemprotected void preCommitJob(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
throws IOException
context - job contextpending - the pending operationsIOException - any failurepublic void commitJob(org.apache.hadoop.mapreduce.JobContext context)
throws IOException
Precommit: identify pending uploads, then allow subclasses to validate the state of the destination and the pending uploads. Any failure here triggers an abort of all pending uploads.
Commit internal: do the final commit sequence.
The final commit action is to build the __SUCCESS file entry.
commitJob in class org.apache.hadoop.mapreduce.OutputCommittercontext - job contextIOException - any failureprotected void jobCompleted(boolean success)
success - did the job succeed.public abstract void cleanupStagingDirs()
protected abstract List<SinglePendingCommit> listPendingUploadsToCommit(org.apache.hadoop.mapreduce.JobContext context) throws IOException
context - job contextIOException - Any IO failureprotected void cleanup(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions)
throws IOException
context - job contextsuppressExceptions - should exceptions be suppressed?IOException - any failure if exceptions were not suppressed.public void cleanupJob(org.apache.hadoop.mapreduce.JobContext context)
throws IOException
cleanupJob in class org.apache.hadoop.mapreduce.OutputCommitterIOExceptionprotected void maybeIgnore(boolean suppress,
String action,
Invoker.VoidOperation operation)
throws IOException
suppress - should raised IOEs be suppressed?action - action (for logging when the IOE is supressed.operation - operationIOException - if operation raised an IOE and suppress == falseprotected void maybeIgnore(boolean suppress,
String action,
IOException ex)
throws IOException
suppress - should raised IOEs be suppressed?action - action (for logging when the IOE is suppressed.ex - exceptionIOException - if suppress == falseprotected CommitOperations getCommitOperations()
protected String getRole()
protected final ExecutorService buildThreadPool(org.apache.hadoop.mapreduce.JobContext context)
ExecutorService for parallel tasks. The number of
threads in the thread-pool is set by s3.multipart.committer.num-threads.
If num-threads is 0, this will return null;context - the JobContext for this commitExecutorService or null for the number of threadsprotected void deleteTaskAttemptPathQuietly(org.apache.hadoop.mapreduce.TaskAttemptContext context)
context - task contextprotected void abortPendingUploads(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending,
boolean suppressExceptions)
throws IOException
context - job contextpending - pending uploadssuppressExceptions - should exceptions be suppressedIOException - any exception raisedCopyright © 2008–2021 Apache Software Foundation. All rights reserved.