public abstract class AbstractS3ACommitter
extends org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
Modifier | Constructor and Description |
---|---|
protected |
AbstractS3ACommitter(org.apache.hadoop.fs.Path outputPath,
org.apache.hadoop.mapreduce.TaskAttemptContext context)
Create a committer.
|
Modifier and Type | Method and Description |
---|---|
void |
abortJob(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.mapreduce.JobStatus.State state) |
protected void |
abortJobInternal(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions)
The internal job abort operation; can be overridden in tests.
|
protected void |
abortPendingUploads(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending,
boolean suppressExceptions)
Abort all pending uploads in the list.
|
protected void |
abortPendingUploadsInCleanup(boolean suppressExceptions)
Abort all pending uploads to the destination directory during
job cleanup operations.
|
protected ExecutorService |
buildThreadPool(org.apache.hadoop.mapreduce.JobContext context)
Returns an
ExecutorService for parallel tasks. |
protected void |
cleanup(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions)
Cleanup the job context, including aborting anything pending.
|
void |
cleanupJob(org.apache.hadoop.mapreduce.JobContext context) |
abstract void |
cleanupStagingDirs()
Clean up any staging directories.
|
void |
commitJob(org.apache.hadoop.mapreduce.JobContext context)
Commit work.
|
protected void |
commitJobInternal(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
Internal Job commit operation: where the S3 requests are made
(potentially in parallel).
|
protected void |
commitPendingUploads(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
Commit a list of pending uploads.
|
protected void |
deleteTaskAttemptPathQuietly(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Delete the task attempt path without raising any errors.
|
protected abstract org.apache.hadoop.fs.Path |
getBaseTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Compute the base path where the output of a task attempt is written.
|
protected CommitOperations |
getCommitOperations()
Get the commit actions instance.
|
org.apache.hadoop.conf.Configuration |
getConf() |
org.apache.hadoop.fs.FileSystem |
getDestFS()
Get the destination FS, creating it on demand if needed.
|
protected org.apache.hadoop.fs.FileSystem |
getDestinationFS(org.apache.hadoop.fs.Path out,
org.apache.hadoop.conf.Configuration config)
Get the destination filesystem from the output path and the configuration.
|
S3AFileSystem |
getDestS3AFS()
Get the destination as an S3A Filesystem; casting it.
|
protected abstract org.apache.hadoop.fs.Path |
getJobAttemptPath(int appAttemptId)
Compute the path where the output of a given job attempt will be placed.
|
org.apache.hadoop.fs.Path |
getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context)
Compute the path where the output of a given job attempt will be placed.
|
org.apache.hadoop.mapreduce.JobContext |
getJobContext()
Get the job/task context this committer was instantiated with.
|
abstract String |
getName()
Get the name of this committer.
|
org.apache.hadoop.fs.Path |
getOutputPath()
Final path of output, in the destination FS.
|
protected String |
getRole()
Used in logging and reporting to help disentangle messages.
|
protected org.apache.hadoop.fs.FileSystem |
getTaskAttemptFilesystem(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Get the task attempt path filesystem.
|
org.apache.hadoop.fs.Path |
getTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Compute the path where the output of a task attempt is stored until
that task is committed.
|
abstract org.apache.hadoop.fs.Path |
getTempTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Get a temporary directory for data.
|
org.apache.hadoop.fs.Path |
getWorkPath()
This is the critical method for
FileOutputFormat ; it declares
the path for work. |
protected void |
initOutput(org.apache.hadoop.fs.Path out)
Init the output filesystem and path.
|
protected void |
jobCompleted(boolean success)
Job completion outcome; this may be subclassed in tests.
|
protected abstract List<SinglePendingCommit> |
listPendingUploadsToCommit(org.apache.hadoop.mapreduce.JobContext context)
Get the list of pending uploads for this job attempt.
|
protected List<SinglePendingCommit> |
loadPendingsetFiles(org.apache.hadoop.mapreduce.JobContext context,
boolean suppressExceptions,
org.apache.hadoop.fs.FileSystem fs,
Iterable<? extends org.apache.hadoop.fs.FileStatus> pendingCommitFiles)
Try to read every pendingset file and build a list of them/
In the case of a failure to read the file, exceptions are held until all
reads have been attempted.
|
protected void |
maybeCreateSuccessMarker(org.apache.hadoop.mapreduce.JobContext context,
List<String> filenames)
if the job requires a success marker on a successful job,
create the file
CommitConstants._SUCCESS . |
protected void |
maybeCreateSuccessMarkerFromCommits(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
if the job requires a success marker on a successful job,
create the file
CommitConstants._SUCCESS . |
protected void |
maybeIgnore(boolean suppress,
String action,
Invoker.VoidOperation operation)
Execute an operation; maybe suppress any raised IOException.
|
protected void |
maybeIgnore(boolean suppress,
String action,
IOException ex)
Log or rethrow a caught IOException.
|
protected void |
preCommitJob(org.apache.hadoop.mapreduce.JobContext context,
List<SinglePendingCommit> pending)
Subclass-specific pre commit actions.
|
void |
recoverTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext)
Task recovery considered unsupported: Warn and fail.
|
protected boolean |
requiresDelayedCommitOutputInFileSystem()
Flag to indicate whether or not the destination filesystem needs
to be configured to support magic paths where the output isn't immediately
visible.
|
protected void |
setConf(org.apache.hadoop.conf.Configuration conf) |
protected void |
setDestFS(org.apache.hadoop.fs.FileSystem destFS)
Set the destination FS: the FS of the final output.
|
protected void |
setOutputPath(org.apache.hadoop.fs.Path outputPath)
Set the output path.
|
void |
setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext context)
Base job setup deletes the success marker.
|
protected void |
setWorkPath(org.apache.hadoop.fs.Path workPath)
Set the work path for this committer.
|
String |
toString() |
hasOutputPath
protected AbstractS3ACommitter(org.apache.hadoop.fs.Path outputPath, org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException
outputPath
- the job's output path: MUST NOT be null.context
- the task's contextIOException
- on a failureprotected void initOutput(org.apache.hadoop.fs.Path out) throws IOException
out
- output pathIOException
- failure to create the FS.public final org.apache.hadoop.mapreduce.JobContext getJobContext()
public final org.apache.hadoop.fs.Path getOutputPath()
getOutputPath
in class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
protected final void setOutputPath(org.apache.hadoop.fs.Path outputPath)
outputPath
- new valuepublic org.apache.hadoop.fs.Path getWorkPath()
FileOutputFormat
; it declares
the path for work.getWorkPath
in class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
protected void setWorkPath(org.apache.hadoop.fs.Path workPath)
workPath
- the work path to use.public org.apache.hadoop.conf.Configuration getConf()
protected void setConf(org.apache.hadoop.conf.Configuration conf)
public org.apache.hadoop.fs.FileSystem getDestFS() throws IOException
IOException
- if the FS cannot be instantiated.public S3AFileSystem getDestS3AFS() throws IOException
IOException
- if the FS cannot be instantiated.protected void setDestFS(org.apache.hadoop.fs.FileSystem destFS)
destFS
- destination FS.public org.apache.hadoop.fs.Path getJobAttemptPath(org.apache.hadoop.mapreduce.JobContext context)
context
- the context of the job. This is used to get the
application attempt ID.protected abstract org.apache.hadoop.fs.Path getJobAttemptPath(int appAttemptId)
appAttemptId
- the ID of the application attempt for this job.public org.apache.hadoop.fs.Path getTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
getBaseTaskAttemptPath(TaskAttemptContext)
;
subclasses may return different values.context
- the context of the task attempt.protected abstract org.apache.hadoop.fs.Path getBaseTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
context
- the context of the task attempt.public abstract org.apache.hadoop.fs.Path getTempTaskAttemptPath(org.apache.hadoop.mapreduce.TaskAttemptContext context)
context
- task contextpublic abstract String getName()
public String toString()
toString
in class org.apache.hadoop.mapreduce.lib.output.PathOutputCommitter
protected org.apache.hadoop.fs.FileSystem getDestinationFS(org.apache.hadoop.fs.Path out, org.apache.hadoop.conf.Configuration config) throws IOException
out
- output pathconfig
- job/task configPathCommitException
- output path isn't to an S3A FS instance.IOException
- failure to instantiate the FS.protected boolean requiresDelayedCommitOutputInFileSystem()
public void recoverTask(org.apache.hadoop.mapreduce.TaskAttemptContext taskContext) throws IOException
recoverTask
in class org.apache.hadoop.mapreduce.OutputCommitter
taskContext
- Context of the task whose output is being recoveredIOException
- always.protected void maybeCreateSuccessMarkerFromCommits(org.apache.hadoop.mapreduce.JobContext context, List<SinglePendingCommit> pending) throws IOException
CommitConstants._SUCCESS
.
While the classic committers create a 0-byte file, the S3Guard committers
PUT up a the contents of a SuccessData
file.context
- job contextpending
- the pending commitsIOException
- IO failureprotected void maybeCreateSuccessMarker(org.apache.hadoop.mapreduce.JobContext context, List<String> filenames) throws IOException
CommitConstants._SUCCESS
.
While the classic committers create a 0-byte file, the S3Guard committers
PUT up a the contents of a SuccessData
file.context
- job contextfilenames
- list of filenames.IOException
- IO failurepublic void setupTask(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException
setupTask
in class org.apache.hadoop.mapreduce.OutputCommitter
context
- contextIOException
- IO failureprotected org.apache.hadoop.fs.FileSystem getTaskAttemptFilesystem(org.apache.hadoop.mapreduce.TaskAttemptContext context) throws IOException
context
- task attemptIOException
- failure to instantiateprotected void commitPendingUploads(org.apache.hadoop.mapreduce.JobContext context, List<SinglePendingCommit> pending) throws IOException
context
- job contextpending
- list of pending uploadsIOException
- on any failureprotected List<SinglePendingCommit> loadPendingsetFiles(org.apache.hadoop.mapreduce.JobContext context, boolean suppressExceptions, org.apache.hadoop.fs.FileSystem fs, Iterable<? extends org.apache.hadoop.fs.FileStatus> pendingCommitFiles) throws IOException
context
- job contextsuppressExceptions
- whether to suppress exceptions.fs
- job attempt fspendingCommitFiles
- list of files found in the listing scanIOException
- on a failure when suppressExceptions is false.protected void commitJobInternal(org.apache.hadoop.mapreduce.JobContext context, List<SinglePendingCommit> pending) throws IOException
context
- job contextpending
- pending requestIOException
- any failurepublic void abortJob(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.mapreduce.JobStatus.State state) throws IOException
abortJob
in class org.apache.hadoop.mapreduce.OutputCommitter
IOException
protected void abortJobInternal(org.apache.hadoop.mapreduce.JobContext context, boolean suppressExceptions) throws IOException
abortJob(JobContext, JobStatus.State)
call.
The base implementation calls cleanup(JobContext, boolean)
context
- job contextsuppressExceptions
- should exceptions be suppressed?IOException
- any IO problem raised when suppressExceptions is false.protected void abortPendingUploadsInCleanup(boolean suppressExceptions) throws IOException
suppressExceptions
- should exceptions be suppressedIOException
- IO problemprotected void preCommitJob(org.apache.hadoop.mapreduce.JobContext context, List<SinglePendingCommit> pending) throws IOException
context
- job contextpending
- the pending operationsIOException
- any failurepublic void commitJob(org.apache.hadoop.mapreduce.JobContext context) throws IOException
Precommit: identify pending uploads, then allow subclasses to validate the state of the destination and the pending uploads. Any failure here triggers an abort of all pending uploads.
Commit internal: do the final commit sequence.
The final commit action is to build the __SUCCESS
file entry.
commitJob
in class org.apache.hadoop.mapreduce.OutputCommitter
context
- job contextIOException
- any failureprotected void jobCompleted(boolean success)
success
- did the job succeed.public abstract void cleanupStagingDirs()
protected abstract List<SinglePendingCommit> listPendingUploadsToCommit(org.apache.hadoop.mapreduce.JobContext context) throws IOException
context
- job contextIOException
- Any IO failureprotected void cleanup(org.apache.hadoop.mapreduce.JobContext context, boolean suppressExceptions) throws IOException
context
- job contextsuppressExceptions
- should exceptions be suppressed?IOException
- any failure if exceptions were not suppressed.public void cleanupJob(org.apache.hadoop.mapreduce.JobContext context) throws IOException
cleanupJob
in class org.apache.hadoop.mapreduce.OutputCommitter
IOException
protected void maybeIgnore(boolean suppress, String action, Invoker.VoidOperation operation) throws IOException
suppress
- should raised IOEs be suppressed?action
- action (for logging when the IOE is supressed.operation
- operationIOException
- if operation raised an IOE and suppress == falseprotected void maybeIgnore(boolean suppress, String action, IOException ex) throws IOException
suppress
- should raised IOEs be suppressed?action
- action (for logging when the IOE is suppressed.ex
- exceptionIOException
- if suppress == falseprotected CommitOperations getCommitOperations()
protected String getRole()
protected final ExecutorService buildThreadPool(org.apache.hadoop.mapreduce.JobContext context)
ExecutorService
for parallel tasks. The number of
threads in the thread-pool is set by s3.multipart.committer.num-threads.
If num-threads is 0, this will return null;context
- the JobContext for this commitExecutorService
or null for the number of threadsprotected void deleteTaskAttemptPathQuietly(org.apache.hadoop.mapreduce.TaskAttemptContext context)
context
- task contextprotected void abortPendingUploads(org.apache.hadoop.mapreduce.JobContext context, List<SinglePendingCommit> pending, boolean suppressExceptions) throws IOException
context
- job contextpending
- pending uploadssuppressExceptions
- should exceptions be suppressedIOException
- any exception raisedCopyright © 2008–2021 Apache Software Foundation. All rights reserved.