@InterfaceAudience.Private @InterfaceStability.Evolving public class DynamoDBMetadataStore extends Object implements MetadataStore
MetadataStore
that persists
file system metadata to DynamoDB.
The current implementation uses a schema consisting of a single table. The
name of the table can be configured by config key
Constants.S3GUARD_DDB_TABLE_NAME_KEY
.
By default, it matches the name of the S3 bucket. Each item in the table
represents a single directory or file. Its path is split into separate table
attributes:
s3a://bucket/dir1 |-- dir2 | |-- file1 | `-- file2 `-- dir3 |-- dir4 | `-- file3 |-- dir5 | `-- file4 `-- dir6This is persisted to a single DynamoDB table as:
========================================================================= | parent | child | is_dir | mod_time | len | ... | ========================================================================= | /bucket | dir1 | true | | | | | /bucket/dir1 | dir2 | true | | | | | /bucket/dir1 | dir3 | true | | | | | /bucket/dir1/dir2 | file1 | | 100 | 111 | | | /bucket/dir1/dir2 | file2 | | 200 | 222 | | | /bucket/dir1/dir3 | dir4 | true | | | | | /bucket/dir1/dir3 | dir5 | true | | | | | /bucket/dir1/dir3/dir4 | file3 | | 300 | 333 | | | /bucket/dir1/dir3/dir5 | file4 | | 400 | 444 | | | /bucket/dir1/dir3 | dir6 | true | | | | =========================================================================This choice of schema is efficient for read access patterns.
get(Path)
can be served from a single item lookup.
listChildren(Path)
can be served from a query against all rows
matching the parent (the partition key) and the returned list is guaranteed
to be sorted by child (the range key). Tracking whether or not a path is a
directory helps prevent unnecessary queries during traversal of an entire
sub-tree.
Some mutating operations, notably deleteSubtree(Path)
and
move(Collection, Collection)
, are less efficient with this schema.
They require mutating multiple items in the DynamoDB table.
By default, DynamoDB access is performed within the same AWS region as
the S3 bucket that hosts the S3A instance. During initialization, it checks
the location of the S3 bucket and creates a DynamoDB client connected to the
same region. The region may also be set explicitly by setting the config
parameter fs.s3a.s3guard.ddb.region
to the corresponding region.Modifier and Type | Field and Description |
---|---|
static String |
E_INCOMPATIBLE_VERSION
Error: version mismatch.
|
static String |
E_NO_VERSION_MARKER
Error: version marker not found in table.
|
static org.slf4j.Logger |
LOG |
static long |
MIN_RETRY_SLEEP_MSEC
Initial delay for retries when batched operations get throttled by
DynamoDB.
|
static int |
VERSION
Current version number.
|
static String |
VERSION_MARKER
parent/child name to use in the version marker.
|
Constructor and Description |
---|
DynamoDBMetadataStore() |
Modifier and Type | Method and Description |
---|---|
void |
close() |
void |
delete(org.apache.hadoop.fs.Path path)
Deletes exactly one path, leaving a tombstone to prevent lingering,
inconsistent copies of it from being listed.
|
void |
deleteSubtree(org.apache.hadoop.fs.Path path)
Deletes the entire sub-tree rooted at the given path, leaving tombstones
to prevent lingering, inconsistent copies of it from being listed.
|
void |
destroy()
Destroy all resources associated with the metadata store.
|
void |
forgetMetadata(org.apache.hadoop.fs.Path path)
Removes the record of exactly one path.
|
PathMetadata |
get(org.apache.hadoop.fs.Path path)
Gets metadata for a path.
|
PathMetadata |
get(org.apache.hadoop.fs.Path path,
boolean wantEmptyDirectoryFlag)
Gets metadata for a path.
|
Map<String,String> |
getDiagnostics()
Get any diagnostics information from a store, as a list of (key, value)
tuples for display.
|
void |
initialize(org.apache.hadoop.conf.Configuration config)
Performs one-time initialization of the metadata store via configuration.
|
void |
initialize(org.apache.hadoop.fs.FileSystem fs)
Performs one-time initialization of the metadata store.
|
DirListingMetadata |
listChildren(org.apache.hadoop.fs.Path path)
Lists metadata for all direct children of a path.
|
void |
move(Collection<org.apache.hadoop.fs.Path> pathsToDelete,
Collection<PathMetadata> pathsToCreate)
Record the effects of a
FileSystem.rename(Path, Path) in the
MetadataStore. |
void |
prune(long modTime)
Clear any metadata older than a specified time from the repository.
|
void |
put(Collection<PathMetadata> metas)
Saves metadata for any number of paths.
|
void |
put(DirListingMetadata meta)
Save directory listing metadata.
|
void |
put(PathMetadata meta)
Saves metadata for exactly one path.
|
String |
toString() |
void |
updateParameters(Map<String,String> parameters)
Tune/update parameters for an existing table.
|
public static final org.slf4j.Logger LOG
public static final String VERSION_MARKER
public static final int VERSION
public static final String E_NO_VERSION_MARKER
public static final String E_INCOMPATIBLE_VERSION
public static final long MIN_RETRY_SLEEP_MSEC
public void initialize(org.apache.hadoop.fs.FileSystem fs) throws IOException
MetadataStore
initialize
in interface MetadataStore
fs
- FileSystem
associated with the MetadataStoreIOException
- if there is an errorpublic void initialize(org.apache.hadoop.conf.Configuration config) throws IOException
initialize(FileSystem)
with an initialized S3AFileSystem
instance.
Without a filesystem to act as a reference point, the configuration itself
must declare the table name and region in the
Constants.S3GUARD_DDB_TABLE_NAME_KEY
and
Constants.S3GUARD_DDB_REGION_KEY
respectively.initialize
in interface MetadataStore
config
- Configuration.IOException
- if there is an errorIllegalArgumentException
- if the configuration is incompleteinitialize(FileSystem)
public void delete(org.apache.hadoop.fs.Path path) throws IOException
MetadataStore
delete
in interface MetadataStore
path
- the path to deleteIOException
- if there is an errorpublic void forgetMetadata(org.apache.hadoop.fs.Path path) throws IOException
MetadataStore
MetadataStore.delete(Path)
. It is currently intended for testing
only, and a need to use it as part of normal FileSystem usage is not
anticipated.forgetMetadata
in interface MetadataStore
path
- the path to deleteIOException
- if there is an errorpublic void deleteSubtree(org.apache.hadoop.fs.Path path) throws IOException
MetadataStore
MetadataStore.get(Path)
,
implementations must also update any stored DirListingMetadata
objects which track the parent of this file.deleteSubtree
in interface MetadataStore
path
- the root of the sub-tree to deleteIOException
- if there is an errorpublic PathMetadata get(org.apache.hadoop.fs.Path path) throws IOException
MetadataStore
get
in interface MetadataStore
path
- the path to getpath
, null
if not foundIOException
- if there is an errorpublic PathMetadata get(org.apache.hadoop.fs.Path path, boolean wantEmptyDirectoryFlag) throws IOException
MetadataStore
PathMetadata.isEmptyDirectory()
. Since determining emptiness
may be an expensive operation, this can save wasted work.get
in interface MetadataStore
path
- the path to getwantEmptyDirectoryFlag
- Set to true to give a hint to the
MetadataStore that it should try to compute the empty directory flag.path
, null
if not foundIOException
- if there is an errorpublic DirListingMetadata listChildren(org.apache.hadoop.fs.Path path) throws IOException
MetadataStore
listChildren
in interface MetadataStore
path
- the path to listpath
which are being
tracked by the MetadataStore, or null
if the path was not found
in the MetadataStore.IOException
- if there is an errorpublic void move(Collection<org.apache.hadoop.fs.Path> pathsToDelete, Collection<PathMetadata> pathsToCreate) throws IOException
MetadataStore
FileSystem.rename(Path, Path)
in the
MetadataStore. Clients provide explicit enumeration of the affected
paths (recursively), before and after the rename.
This operation is not atomic, unless specific implementations claim
otherwise.
On the need to provide an enumeration of directory trees instead of just
source and destination paths:
Since a MetadataStore does not have to track all metadata for the
underlying storage system, and a new MetadataStore may be created on an
existing underlying filesystem, this move() may be the first time the
MetadataStore sees the affected paths. Therefore, simply providing src
and destination paths may not be enough to record the deletions (under
src path) and creations (at destination) that are happening during the
rename().move
in interface MetadataStore
pathsToDelete
- Collection of all paths that were removed from the
source directory tree of the move.pathsToCreate
- Collection of all PathMetadata for the new paths
that were created at the destination of the rename
().IOException
- if there is an errorpublic void put(PathMetadata meta) throws IOException
MetadataStore
DirListingMetadata
objects which
track the immediate parent of this file.put
in interface MetadataStore
meta
- the metadata to saveIOException
- if there is an errorpublic void put(Collection<PathMetadata> metas) throws IOException
MetadataStore
put
in interface MetadataStore
metas
- the metadata to saveIOException
- if there is an errorpublic void put(DirListingMetadata meta) throws IOException
MetadataStore
MetadataStore
implementations may
subsequently keep track of all modifications to the directory contents at
this path, and return authoritative results from subsequent calls to
MetadataStore.listChildren(Path)
. See DirListingMetadata
.
Any authoritative results returned are only authoritative for the scope
of the MetadataStore
: A per-process MetadataStore
, for
example, would only show results visible to that process, potentially
missing metadata updates (create, delete) made to the same path by
another process.put
in interface MetadataStore
meta
- Directory listing metadata.IOException
- if there is an errorpublic void close()
close
in interface Closeable
close
in interface AutoCloseable
public void destroy() throws IOException
MetadataStore
destroy
in interface MetadataStore
IOException
- if there is an errorpublic void prune(long modTime) throws IOException
MetadataStore
prune
in interface MetadataStore
modTime
- Oldest modification time to allowIOException
- if there is an errorpublic Map<String,String> getDiagnostics() throws IOException
MetadataStore
getDiagnostics
in interface MetadataStore
IOException
- if there is an errorpublic void updateParameters(Map<String,String> parameters) throws IOException
MetadataStore
updateParameters
in interface MetadataStore
parameters
- map of params to change.IOException
- if there is an errorCopyright © 2018 Apache Software Foundation. All Rights Reserved.