public class DeleteOperation extends ExecutingStoreOperation<Boolean>
InternalConstants.MAX_ENTRIES_TO_DELETE
, the maximum a single
POST permits.
Smaller pages executed in parallel may have different performance characteristics when deleting very large directories. Any exploration of options here MUST be done with performance measurements taken from test runs in EC2 against local S3 stores, so as to ensure network latencies do not skew the results.
Constructor and Description |
---|
DeleteOperation(StoreContext context,
S3AFileStatus status,
boolean recursive,
OperationCallbacks callbacks,
int pageSize,
boolean dirOperationsPurgeUploads)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
protected void |
deleteDirectoryTree(org.apache.hadoop.fs.Path path,
String dirKey)
Delete a directory tree.
|
Boolean |
execute()
Delete a file or directory tree.
|
long |
getFilesDeleted() |
Optional<Long> |
getUploadsAborted()
Get the count of uploads aborted.
|
apply, executeOnlyOnce
activateAuditSpan, getAuditSpan, getStoreContext
public DeleteOperation(StoreContext context, S3AFileStatus status, boolean recursive, OperationCallbacks callbacks, int pageSize, boolean dirOperationsPurgeUploads)
context
- store contextstatus
- pre-fetched source statusrecursive
- recursive delete?callbacks
- callback providerpageSize
- size of delete pagesdirOperationsPurgeUploads
- Do directory operations purge pending uploads?public long getFilesDeleted()
public Optional<Long> getUploadsAborted()
@Retries.RetryTranslated public Boolean execute() throws IOException
This call does not create any fake parent directory; that is left to the caller. The actual delete call is done in a separate thread. Only one delete at a time is submitted, however, to reduce the complexity of recovering from failures.
With S3Guard removed, the problem of updating any DynamoDB store has gone away -delete calls could now be issued in parallel. However, rate limiting may be required to keep write load below the throttling point. Every entry in a single bulk delete call counts as a single write request -overloading an S3 partition with delete calls has been a problem in the past.
execute
in class ExecutingStoreOperation<Boolean>
org.apache.hadoop.fs.PathIsNotEmptyDirectoryException
- if the path is a dir and this
is not a recursive delete.IOException
- list failures or an inability to delete a file.@Retries.RetryTranslated protected void deleteDirectoryTree(org.apache.hadoop.fs.Path path, String dirKey) throws IOException
This is done by asking the filesystem for a list of all objects under the directory path.
Once the first pageSize
worth of objects has been listed, a batch
delete is queued for execution in a separate thread; subsequent batches
block waiting for the first call to complete or fail before again,
being deleted in the separate thread.
After all listed objects are queued for deletion,
path
- directory pathdirKey
- directory keyIOException
- failureCopyright © 2008–2024 Apache Software Foundation. All rights reserved.