public class CommitOperations extends AbstractStoreOperation implements org.apache.hadoop.fs.statistics.IOStatisticsSource
Modifier and Type | Class and Description |
---|---|
static class |
CommitOperations.MaybeIOE
A holder for a possible IOException; the call
CommitOperations.MaybeIOE.maybeRethrow()
will throw any exception passed into the constructor, and be a no-op
if none was. |
Modifier and Type | Field and Description |
---|---|
static org.apache.hadoop.fs.PathFilter |
PENDING_FILTER
Filter to find all {code .pending} files.
|
static org.apache.hadoop.fs.PathFilter |
PENDINGSET_FILTER
Filter to find all {code .pendingset} files.
|
Constructor and Description |
---|
CommitOperations(S3AFileSystem fs)
Instantiate.
|
CommitOperations(S3AFileSystem fs,
CommitterStatistics committerStatistics,
String outputPath)
Instantiate.
|
Modifier and Type | Method and Description |
---|---|
CommitOperations.MaybeIOE |
abortAllSinglePendingCommits(org.apache.hadoop.fs.Path pendingDir,
CommitContext commitContext,
boolean recursive)
Enumerate all pending files in a dir/tree, abort.
|
void |
abortMultipartCommit(String destKey,
String uploadId)
Create an
AbortMultipartUpload request and POST it to S3,
incrementing statistics afterwards. |
int |
abortPendingUploadsUnderPath(org.apache.hadoop.fs.Path dest)
Abort all pending uploads to the destination FS under a path.
|
void |
abortSingleCommit(SinglePendingCommit commit)
Abort the multipart commit supplied.
|
void |
addFileSystemStatistics(Map<String,Long> dest)
Add the filesystem statistics to the map; overwriting anything
with the same name.
|
CommitOperations.MaybeIOE |
commit(SinglePendingCommit commit,
String origin)
Commit a single pending commit; exceptions are caught
and converted to an outcome.
|
void |
commitOrFail(SinglePendingCommit commit)
Commit the operation, throwing an exception on any failure.
|
CommitContext |
createCommitContext(org.apache.hadoop.mapreduce.JobContext context,
org.apache.hadoop.fs.Path path,
int committerThreads,
org.apache.hadoop.fs.statistics.IOStatisticsContext ioStatisticsContext)
Create a commit context for a job or task.
|
CommitContext |
createCommitContextForTesting(org.apache.hadoop.fs.Path path,
String jobId,
int committerThreads)
Create a stub commit context for tests.
|
void |
createSuccessMarker(org.apache.hadoop.fs.Path outputPath,
SuccessData successData,
boolean addMetrics)
Save the success data to the
_SUCCESS file. |
void |
deleteSuccessMarker(org.apache.hadoop.fs.Path outputPath)
Delete any existing
_SUCCESS file. |
static Optional<Long> |
extractMagicFileLength(org.apache.hadoop.fs.FileSystem fs,
org.apache.hadoop.fs.Path path)
Get the magic file length of a file.
|
org.apache.hadoop.fs.statistics.IOStatistics |
getIOStatistics() |
protected CommitterStatistics |
getStatistics() |
void |
jobCompleted(boolean success)
Note that a job has completed.
|
List<software.amazon.awssdk.services.s3.model.MultipartUpload> |
listPendingUploadsUnderPath(org.apache.hadoop.fs.Path dest)
List all pending uploads to the destination FS under a path.
|
org.apache.commons.lang3.tuple.Pair<PendingSet,List<org.apache.commons.lang3.tuple.Pair<org.apache.hadoop.fs.LocatedFileStatus,IOException>>> |
loadSinglePendingCommits(org.apache.hadoop.fs.Path pendingDir,
boolean recursive,
CommitContext commitContext)
Load all single pending commits in the directory, using the
outer submitter.
|
org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> |
locateAllSinglePendingCommits(org.apache.hadoop.fs.Path pendingDir,
boolean recursive)
Locate all files with the pending suffix under a directory.
|
protected org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> |
ls(org.apache.hadoop.fs.Path path,
boolean recursive)
List files.
|
IOException |
makeIOE(String key,
Exception ex)
Convert any exception to an IOE, if needed.
|
void |
revertCommit(SinglePendingCommit commit)
Revert a pending commit by deleting the destination.
|
void |
taskCompleted(boolean success)
Note that a task has completed.
|
static List<software.amazon.awssdk.services.s3.model.CompletedPart> |
toPartEtags(List<String> tagIds)
Convert an ordered list of strings to a list of index etag parts.
|
String |
toString() |
SinglePendingCommit |
uploadFileToPendingCommit(File localFile,
org.apache.hadoop.fs.Path destPath,
String partition,
long uploadPartSize,
org.apache.hadoop.util.Progressable progress)
Upload all the data in the local file, returning the information
needed to commit the work.
|
activateAuditSpan, getAuditSpan, getStoreContext
public static final org.apache.hadoop.fs.PathFilter PENDINGSET_FILTER
public static final org.apache.hadoop.fs.PathFilter PENDING_FILTER
public CommitOperations(S3AFileSystem fs) throws IOException
fs
- FS to bind toIOException
- failure to bind.public CommitOperations(S3AFileSystem fs, CommitterStatistics committerStatistics, String outputPath) throws IOException
fs
- FS to bind tocommitterStatistics
- committer statisticsoutputPath
- destination of work.IOException
- failure to bind.public static List<software.amazon.awssdk.services.s3.model.CompletedPart> toPartEtags(List<String> tagIds)
tagIds
- list of tagsprotected CommitterStatistics getStatistics()
public org.apache.hadoop.fs.statistics.IOStatistics getIOStatistics()
getIOStatistics
in interface org.apache.hadoop.fs.statistics.IOStatisticsSource
public void commitOrFail(SinglePendingCommit commit) throws IOException
commit
- commit to executeIOException
- on a failurepublic CommitOperations.MaybeIOE commit(SinglePendingCommit commit, String origin)
commit
- entry to commitorigin
- origin path/string for outcome textpublic org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> locateAllSinglePendingCommits(org.apache.hadoop.fs.Path pendingDir, boolean recursive) throws IOException
pendingDir
- directoryrecursive
- recursive listing?IOException
- if there is a problem listing the path.public org.apache.commons.lang3.tuple.Pair<PendingSet,List<org.apache.commons.lang3.tuple.Pair<org.apache.hadoop.fs.LocatedFileStatus,IOException>>> loadSinglePendingCommits(org.apache.hadoop.fs.Path pendingDir, boolean recursive, CommitContext commitContext) throws IOException
pendingDir
- directory containing commitsrecursive
- do a recursive scan?commitContext
- commit contextIOException
- on a failure to list the files.public IOException makeIOE(String key, Exception ex)
key
- key to use in a path exceptionex
- exceptionpublic void abortSingleCommit(SinglePendingCommit commit) throws IOException
commit
- pending commit to abortFileNotFoundException
- if the abort ID is unknownIOException
- on any failurepublic void abortMultipartCommit(String destKey, String uploadId) throws IOException
AbortMultipartUpload
request and POST it to S3,
incrementing statistics afterwards.destKey
- destination keyuploadId
- upload to cancelFileNotFoundException
- if the abort ID is unknownIOException
- on any failurepublic CommitOperations.MaybeIOE abortAllSinglePendingCommits(org.apache.hadoop.fs.Path pendingDir, CommitContext commitContext, boolean recursive) throws IOException
pendingDir
- directory of pending operationscommitContext
- commit contextrecursive
- recurse?IOException
- if there is a problem listing the path.protected org.apache.hadoop.fs.RemoteIterator<org.apache.hadoop.fs.LocatedFileStatus> ls(org.apache.hadoop.fs.Path path, boolean recursive) throws IOException
path
- pathrecursive
- recursive listing?IOException
- failurepublic List<software.amazon.awssdk.services.s3.model.MultipartUpload> listPendingUploadsUnderPath(org.apache.hadoop.fs.Path dest) throws IOException
dest
- destination pathIOException
- IO failurepublic int abortPendingUploadsUnderPath(org.apache.hadoop.fs.Path dest) throws IOException
dest
- destination pathIOException
- IO failurepublic void deleteSuccessMarker(org.apache.hadoop.fs.Path outputPath) throws IOException
_SUCCESS
file.outputPath
- output directoryIOException
- IO problempublic void createSuccessMarker(org.apache.hadoop.fs.Path outputPath, SuccessData successData, boolean addMetrics) throws IOException
_SUCCESS
file.outputPath
- output directorysuccessData
- success data to save.addMetrics
- should the FS metrics be added?IOException
- IO problempublic void revertCommit(SinglePendingCommit commit) throws IOException
commit
- pending commitIOException
- failurepublic SinglePendingCommit uploadFileToPendingCommit(File localFile, org.apache.hadoop.fs.Path destPath, String partition, long uploadPartSize, org.apache.hadoop.util.Progressable progress) throws IOException
localFile
- local file (be a file)destPath
- destination pathpartition
- partition/subdir. Not useduploadPartSize
- size of uploadprogress
- progress callbackIOException
- failurepublic void addFileSystemStatistics(Map<String,Long> dest)
dest
- destination mappublic void taskCompleted(boolean success)
success
- success flagpublic void jobCompleted(boolean success)
success
- success flagpublic CommitContext createCommitContext(org.apache.hadoop.mapreduce.JobContext context, org.apache.hadoop.fs.Path path, int committerThreads, org.apache.hadoop.fs.statistics.IOStatisticsContext ioStatisticsContext) throws IOException
context
- job contextpath
- path for all work.committerThreads
- thread pool sizeioStatisticsContext
- IOStatistics context of current threadIOException
- failure.public CommitContext createCommitContextForTesting(org.apache.hadoop.fs.Path path, @Nullable String jobId, int committerThreads) throws IOException
path
- path for all work.jobId
- job ID; if null a random UUID is generated.committerThreads
- number of committer threads.IOException
- failure.public static Optional<Long> extractMagicFileLength(org.apache.hadoop.fs.FileSystem fs, org.apache.hadoop.fs.Path path) throws IOException
fs
- filesystempath
- pathIOException
- on errorCopyright © 2008–2024 Apache Software Foundation. All rights reserved.