Class CompressionAlgorithm
- java.lang.Object
-
- org.apache.hadoop.conf.Configured
-
- org.apache.accumulo.core.file.rfile.bcfile.CompressionAlgorithm
-
- All Implemented Interfaces:
org.apache.hadoop.conf.Configurable
public class CompressionAlgorithm extends org.apache.hadoop.conf.Configured
There is a static initializer inCompression
that finds all implementations ofCompressionAlgorithmConfiguration
and initializes aCompressionAlgorithm
instance. This promotes a model of the following call graph of initialization by the static initializer, followed by calls togetCodec()
,createCompressionStream(OutputStream, Compressor, int)
, andcreateDecompressionStream(InputStream, Decompressor, int)
. In some cases, the compression and decompression call methods will include a different buffer size for the stream. Note that if the compressed buffer size requested in these calls is zero, we will not set the buffer size for that algorithm. Instead, we will use the default within the codec.The buffer size is configured in the Codec by way of a Hadoop
Configuration
reference. One approach may be to use the same Configuration object, but when calls are made tocreateCompressionStream
andcreateDecompressionStream
with non default buffer sizes, the configuration object must be changed. In this case, concurrent calls tocreateCompressionStream
andcreateDecompressionStream
would mutate the configuration object beneath each other, requiring synchronization to avoid undesirable activity via co-modification. To avoid synchronization entirely, we will create Codecs with their own Configuration object and cache them for re-use. A default codec will be statically created, as mentioned above to ensure we always have a codec available at loader initialization.There is a Guava cache defined within Algorithm that allows us to cache Codecs for re-use. Since they will have their own configuration object and thus do not need to be mutable, there is no concern for using them concurrently; however, the Guava cache exists to ensure a maximal size of the cache and efficient and concurrent read/write access to the cache itself.
To provide Algorithm specific details and to describe what is in code:
LZO will always have the default LZO codec because the buffer size is never overridden within it.
LZ4 will always have the default LZ4 codec because the buffer size is never overridden within it.
GZ will use the default GZ codec for the compression stream, but can potentially use a different codec instance for the decompression stream if the requested buffer size does not match the default GZ buffer size of 32k.
Snappy will use the default Snappy codec with the default buffer size of 64k for the compression stream, but will use a cached codec if the buffer size differs from the default.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
CompressionAlgorithm.FinishOnFlushCompressionStream
-
Field Summary
Fields Modifier and Type Field Description protected static int
DATA_IBUF_SIZE
protected static int
DATA_OBUF_SIZE
-
Constructor Summary
Constructors Constructor Description CompressionAlgorithm(CompressionAlgorithmConfiguration algorithm, org.apache.hadoop.conf.Configuration conf)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description OutputStream
createCompressionStream(OutputStream downStream, org.apache.hadoop.io.compress.Compressor compressor, int downStreamBufferSize)
InputStream
createDecompressionStream(InputStream downStream, org.apache.hadoop.io.compress.Decompressor decompressor, int downStreamBufferSize)
org.apache.hadoop.io.compress.Compressor
getCompressor()
org.apache.hadoop.io.compress.Decompressor
getDecompressor()
String
getName()
Returns the name of the compression algorithm.void
returnCompressor(org.apache.hadoop.io.compress.Compressor compressor)
void
returnDecompressor(org.apache.hadoop.io.compress.Decompressor decompressor)
Returns the specifiedDecompressor
to the codec cache if it is not null.
-
-
-
Field Detail
-
DATA_IBUF_SIZE
protected static final int DATA_IBUF_SIZE
- See Also:
- Constant Field Values
-
DATA_OBUF_SIZE
protected static final int DATA_OBUF_SIZE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
CompressionAlgorithm
public CompressionAlgorithm(CompressionAlgorithmConfiguration algorithm, org.apache.hadoop.conf.Configuration conf)
-
-
Method Detail
-
createDecompressionStream
public InputStream createDecompressionStream(InputStream downStream, org.apache.hadoop.io.compress.Decompressor decompressor, int downStreamBufferSize) throws IOException
- Throws:
IOException
-
createCompressionStream
public OutputStream createCompressionStream(OutputStream downStream, org.apache.hadoop.io.compress.Compressor compressor, int downStreamBufferSize) throws IOException
- Throws:
IOException
-
getCompressor
public org.apache.hadoop.io.compress.Compressor getCompressor()
-
returnCompressor
public void returnCompressor(org.apache.hadoop.io.compress.Compressor compressor)
-
getDecompressor
public org.apache.hadoop.io.compress.Decompressor getDecompressor()
-
returnDecompressor
public void returnDecompressor(org.apache.hadoop.io.compress.Decompressor decompressor)
Returns the specifiedDecompressor
to the codec cache if it is not null.
-
getName
public String getName()
Returns the name of the compression algorithm.- Returns:
- the name
-
-