public class GoogleHadoopFileSystem extends GoogleHadoopFileSystemBase
This implementation sacrifices a small amount of cross-bucket interoperability in favor of more straightforward FileSystem semantics and compatibility with existing Hadoop applications. In particular, it is not subject to bucket-naming constraints, and files are allowed to be placed in root.
GoogleHadoopFileSystemBase.Counter, GoogleHadoopFileSystemBase.ListStatusFileNotFoundBehavior, GoogleHadoopFileSystemBase.OutputStreamType, GoogleHadoopFileSystemBase.ParentTimestampUpdateIncludePredicate
AUTHENTICATION_PREFIX, BLOCK_SIZE_DEFAULT, BLOCK_SIZE_KEY, BUFFERSIZE_DEFAULT, BUFFERSIZE_KEY, counters, DEFAULT_FILTER, defaultBlockSize, ENABLE_GCE_SERVICE_ACCOUNT_AUTH_KEY, GCE_BUCKET_DELETE_ENABLE_DEFAULT, GCE_BUCKET_DELETE_ENABLE_KEY, GCS_APPLICATION_NAME_SUFFIX_DEFAULT, GCS_APPLICATION_NAME_SUFFIX_KEY, GCS_CLIENT_ID_KEY, GCS_CLIENT_SECRET_KEY, GCS_CREATE_SYSTEM_BUCKET_DEFAULT, GCS_CREATE_SYSTEM_BUCKET_KEY, GCS_ENABLE_COPY_WITH_REWRITE_DEFAULT, GCS_ENABLE_COPY_WITH_REWRITE_KEY, GCS_ENABLE_FLAT_GLOB_DEFAULT, GCS_ENABLE_FLAT_GLOB_KEY, GCS_ENABLE_INFER_IMPLICIT_DIRECTORIES_DEFAULT, GCS_ENABLE_INFER_IMPLICIT_DIRECTORIES_KEY, GCS_ENABLE_MARKER_FILE_CREATION_DEFAULT, GCS_ENABLE_MARKER_FILE_CREATION_KEY, GCS_ENABLE_PERFORMANCE_CACHE_DEFAULT, GCS_ENABLE_PERFORMANCE_CACHE_KEY, GCS_ENABLE_REPAIR_IMPLICIT_DIRECTORIES_DEFAULT, GCS_ENABLE_REPAIR_IMPLICIT_DIRECTORIES_KEY, GCS_FILE_SIZE_LIMIT_250GB, GCS_FILE_SIZE_LIMIT_250GB_DEFAULT, GCS_HTTP_CONNECT_TIMEOUT_DEFAULT, GCS_HTTP_CONNECT_TIMEOUT_KEY, GCS_HTTP_MAX_RETRY_DEFAULT, GCS_HTTP_MAX_RETRY_KEY, GCS_HTTP_READ_TIMEOUT_DEFAULT, GCS_HTTP_READ_TIMEOUT_KEY, GCS_HTTP_TRANSPORT_DEFAULT, GCS_HTTP_TRANSPORT_KEY, GCS_INPUTSTREAM_FAST_FAIL_ON_NOT_FOUND_ENABLE_DEFAULT, GCS_INPUTSTREAM_FAST_FAIL_ON_NOT_FOUND_ENABLE_KEY, GCS_INPUTSTREAM_INPLACE_SEEK_LIMIT_DEFAULT, GCS_INPUTSTREAM_INPLACE_SEEK_LIMIT_KEY, GCS_INPUTSTREAM_INTERNALBUFFER_ENABLE_DEFAULT, GCS_INPUTSTREAM_INTERNALBUFFER_ENABLE_KEY, GCS_INPUTSTREAM_SUPPORT_CONTENT_ENCODING_ENABLE_DEFAULT, GCS_INPUTSTREAM_SUPPORT_CONTENT_ENCODING_ENABLE_KEY, GCS_MARKER_FILE_PATTERN_KEY, GCS_MAX_LIST_ITEMS_PER_CALL, GCS_MAX_LIST_ITEMS_PER_CALL_DEFAULT, GCS_MAX_REQUESTS_PER_BATCH, GCS_MAX_REQUESTS_PER_BATCH_DEFAULT, GCS_MAX_WAIT_MILLIS_EMPTY_OBJECT_CREATE_DEFAULT, GCS_MAX_WAIT_MILLIS_EMPTY_OBJECT_CREATE_KEY, GCS_OUTPUTSTREAM_TYPE_DEFAULT, GCS_OUTPUTSTREAM_TYPE_KEY, GCS_PARENT_TIMESTAMP_UPDATE_ENABLE_DEFAULT, GCS_PARENT_TIMESTAMP_UPDATE_ENABLE_KEY, GCS_PARENT_TIMESTAMP_UPDATE_EXCLUDES_DEFAULT, GCS_PARENT_TIMESTAMP_UPDATE_EXCLUDES_KEY, GCS_PARENT_TIMESTAMP_UPDATE_INCLUDES_DEFAULT, GCS_PARENT_TIMESTAMP_UPDATE_INCLUDES_KEY, GCS_PERFORMANCE_CACHE_LIST_CACHING_ENABLE_DEFAULT, GCS_PERFORMANCE_CACHE_LIST_CACHING_ENABLE_KEY, GCS_PERFORMANCE_CACHE_MAX_ENTRY_AGE_MILLIS_DEFAULT, GCS_PERFORMANCE_CACHE_MAX_ENTRY_AGE_MILLIS_KEY, GCS_PROJECT_ID_KEY, GCS_PROXY_ADDRESS_DEFAULT, GCS_PROXY_ADDRESS_KEY, GCS_REQUESTER_PAYS_BUCKETS_KEY, GCS_REQUESTER_PAYS_MODE_KEY, GCS_REQUESTER_PAYS_PROJECT_ID_KEY, GCS_SYSTEM_BUCKET_KEY, GCS_WORKING_DIRECTORY_KEY, gcsfs, GHFS_ID, initUri, listStatusFileNotFoundBehavior, LOG, MR_JOB_HISTORY_DONE_DIR_KEY, MR_JOB_HISTORY_INTERMEDIATE_DONE_DIR_KEY, PATH_CODEC_DEFAULT, PATH_CODEC_KEY, PATH_CODEC_USE_LEGACY_ENCODING, PATH_CODEC_USE_URI_ENCODING, PERMISSIONS_TO_REPORT_DEFAULT, PERMISSIONS_TO_REPORT_KEY, PROPERTIES_FILE, REPLICATION_FACTOR_DEFAULT, SERVICE_ACCOUNT_AUTH_EMAIL_KEY, SERVICE_ACCOUNT_AUTH_KEYFILE_KEY, systemBucket, UNKNOWN_VERSION, VERSION, VERSION_PROPERTY, WRITE_BUFFERSIZE_DEFAULT, WRITE_BUFFERSIZE_KEY
Constructor and Description |
---|
GoogleHadoopFileSystem()
Constructs an instance of GoogleHadoopFileSystem; the internal
GoogleCloudStorageFileSystem will be set up with config settings when initialize() is called.
|
GoogleHadoopFileSystem(GoogleCloudStorageFileSystem gcsfs)
Constructs an instance of GoogleHadoopFileSystem using the provided
GoogleCloudStorageFileSystem; initialize() will not re-initialize it.
|
Modifier and Type | Method and Description |
---|---|
protected void |
checkPath(org.apache.hadoop.fs.Path path) |
void |
configureBuckets(java.lang.String systemBucketName,
boolean createConfiguredBuckets)
Validates and possibly creates the system bucket.
|
org.apache.hadoop.fs.FSDataOutputStream |
createNonRecursive(org.apache.hadoop.fs.Path hadoopPath,
org.apache.hadoop.fs.permission.FsPermission permission,
java.util.EnumSet<org.apache.hadoop.fs.CreateFlag> flags,
int bufferSize,
short replication,
long blockSize,
org.apache.hadoop.util.Progressable progress) |
org.apache.hadoop.fs.Path |
getDefaultWorkingDirectory()
Gets the default value of working directory.
|
org.apache.hadoop.fs.Path |
getFileSystemRoot()
Returns the Hadoop path representing the root of the FileSystem associated with this
FileSystemDescriptor.
|
java.net.URI |
getGcsPath(org.apache.hadoop.fs.Path hadoopPath)
Translates a "gs:/" style hadoopPath (or relative path which is not fully-qualified) into
the appropriate GCS path which is compatible with the underlying GcsFs or gsutil.
|
org.apache.hadoop.fs.Path |
getHadoopPath(java.net.URI gcsPath)
Validates GCS Path belongs to this file system.
|
protected java.lang.String |
getHomeDirectorySubpath()
Override to allow a homedir subpath which sits directly on our FileSystem root.
|
java.lang.String |
getScheme()
As the global-rooted FileSystem, our hadoop-path "scheme" is exactly equal to the general
GCS scheme.
|
append, close, completeLocalOutput, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, create, createCounterMap, delete, delete, deleteOnExit, getCanonicalServiceName, getContentSummary, getDefaultBlockSize, getDefaultPort, getDefaultReplication, getDelegationToken, getFileChecksum, getFileStatus, getGcsFs, getHadoopScheme, getHomeDirectory, getUri, getUsed, getWorkingDirectory, globStatus, globStatus, initialize, initialize, listStatus, makeQualified, mkdirs, open, processDeleteOnExit, rename, setListStatusFileNotFoundBehavior, setOwner, setPermission, setTimes, setVerifyChecksum, setWorkingDirectory, startLocalOutput
access, addDelegationTokens, append, append, appendFile, areSymlinksEnabled, cancelDeleteOnExit, canonicalizeUri, clearStatistics, closeAll, closeAllForUGI, concat, copyFromLocalFile, copyFromLocalFile, copyToLocalFile, copyToLocalFile, create, create, create, create, create, create, create, create, create, create, create, create, createFile, createNewFile, createNonRecursive, createNonRecursive, createSnapshot, createSnapshot, createSymlink, deleteSnapshot, enableSymlinks, exists, fixRelativePart, get, get, get, getAclStatus, getAllStatistics, getAllStoragePolicies, getBlockSize, getCanonicalUri, getChildFileSystems, getDefaultBlockSize, getDefaultReplication, getDefaultUri, getFileBlockLocations, getFileBlockLocations, getFileChecksum, getFileLinkStatus, getFileSystemClass, getFSofPath, getGlobalStorageStatistics, getInitialWorkingDirectory, getLength, getLinkTarget, getLocal, getName, getNamed, getQuotaUsage, getReplication, getServerDefaults, getServerDefaults, getStatistics, getStatistics, getStatus, getStatus, getStoragePolicy, getStorageStatistics, getTrashRoot, getTrashRoots, getUsed, getXAttr, getXAttrs, getXAttrs, isDirectory, isFile, listCorruptFileBlocks, listFiles, listLocatedStatus, listLocatedStatus, listStatus, listStatus, listStatus, listStatusBatch, listStatusIterator, listXAttrs, mkdirs, mkdirs, modifyAclEntries, moveFromLocalFile, moveFromLocalFile, moveToLocalFile, newInstance, newInstance, newInstance, newInstanceLocal, open, primitiveCreate, primitiveMkdir, primitiveMkdir, printStatistics, removeAcl, removeAclEntries, removeDefaultAcl, removeXAttr, rename, renameSnapshot, resolveLink, resolvePath, setAcl, setDefaultUri, setDefaultUri, setReplication, setStoragePolicy, setWriteChecksum, setXAttr, setXAttr, supportsSymlinks, truncate, unsetStoragePolicy
public GoogleHadoopFileSystem()
public GoogleHadoopFileSystem(GoogleCloudStorageFileSystem gcsfs)
public void configureBuckets(java.lang.String systemBucketName, boolean createConfiguredBuckets) throws java.io.IOException
configureBuckets
in class GoogleHadoopFileSystemBase
systemBucketName
- Name of system bucketcreateConfiguredBuckets
- Whether or not to create systemBucketName if it does not exist.java.io.IOException
- if systemBucketName is invalid or cannot be found and createSystemBucket
is false.protected void checkPath(org.apache.hadoop.fs.Path path)
checkPath
in class GoogleHadoopFileSystemBase
protected java.lang.String getHomeDirectorySubpath()
getHomeDirectorySubpath
in class GoogleHadoopFileSystemBase
public org.apache.hadoop.fs.Path getHadoopPath(java.net.URI gcsPath)
getHadoopPath
in class GoogleHadoopFileSystemBase
gcsPath
- Fully-qualified GCS path, of the form gs://public java.net.URI getGcsPath(org.apache.hadoop.fs.Path hadoopPath)
getGcsPath
in class GoogleHadoopFileSystemBase
hadoopPath
- Hadoop path.GoogleHadoopFileSystemBase.getGcsPath(Path)
public java.lang.String getScheme()
getScheme
in interface FileSystemDescriptor
getScheme
in class GoogleHadoopFileSystemBase
public org.apache.hadoop.fs.Path getFileSystemRoot()
FileSystemDescriptor
getFileSystemRoot
in interface FileSystemDescriptor
getFileSystemRoot
in class GoogleHadoopFileSystemBase
public org.apache.hadoop.fs.Path getDefaultWorkingDirectory()
getDefaultWorkingDirectory
in class GoogleHadoopFileSystemBase
public org.apache.hadoop.fs.FSDataOutputStream createNonRecursive(org.apache.hadoop.fs.Path hadoopPath, org.apache.hadoop.fs.permission.FsPermission permission, java.util.EnumSet<org.apache.hadoop.fs.CreateFlag> flags, int bufferSize, short replication, long blockSize, org.apache.hadoop.util.Progressable progress) throws java.io.IOException
createNonRecursive
in class org.apache.hadoop.fs.FileSystem
java.io.IOException
Copyright © 2018. All rights reserved.