Class CodecFactory

  • All Implemented Interfaces:
    org.apache.parquet.compression.CompressionCodecFactory

    public class CodecFactory
    extends Object
    implements org.apache.parquet.compression.CompressionCodecFactory
    • Field Detail

      • CODEC_BY_NAME

        protected static final Map<String,​org.apache.hadoop.io.compress.CompressionCodec> CODEC_BY_NAME
      • configuration

        protected final org.apache.hadoop.conf.Configuration configuration
      • pageSize

        protected final int pageSize
    • Constructor Detail

      • CodecFactory

        public CodecFactory​(org.apache.hadoop.conf.Configuration configuration,
                            int pageSize)
        Create a new codec factory.
        Parameters:
        configuration - used to pass compression codec configuration information
        pageSize - the expected page size, does not set a hard limit, currently just used to set the initial size of the output stream used when compressing a buffer. If this factory is only used to construct decompressors this parameter has no impact on the function of the factory
    • Method Detail

      • createDirectCodecFactory

        public static CodecFactory createDirectCodecFactory​(org.apache.hadoop.conf.Configuration config,
                                                            org.apache.parquet.bytes.ByteBufferAllocator allocator,
                                                            int pageSize)
        Create a codec factory that will provide compressors and decompressors that will work natively with ByteBuffers backed by direct memory.
        Parameters:
        config - configuration options for different compression codecs
        allocator - an allocator for creating result buffers during compression and decompression, must provide buffers backed by Direct memory and return true for the isDirect() method on the ByteBufferAllocator interface
        pageSize - the default page size. This does not set a hard limit on the size of buffers that can be compressed, but performance may be improved by setting it close to the expected size of buffers (in the case of parquet, pages) that will be compressed. This setting is unused in the case of decompressing data, as parquet always records the uncompressed size of a buffer. If this CodecFactory is only going to be used for decompressors, this parameter will not impact the function of the factory.
        Returns:
        a configured direct codec factory
      • getCompressor

        public CodecFactory.BytesCompressor getCompressor​(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
        Specified by:
        getCompressor in interface org.apache.parquet.compression.CompressionCodecFactory
      • getDecompressor

        public CodecFactory.BytesDecompressor getDecompressor​(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
        Specified by:
        getDecompressor in interface org.apache.parquet.compression.CompressionCodecFactory
      • createCompressor

        protected CodecFactory.BytesCompressor createCompressor​(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
      • createDecompressor

        protected CodecFactory.BytesDecompressor createDecompressor​(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
      • getCodec

        protected org.apache.hadoop.io.compress.CompressionCodec getCodec​(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
        Parameters:
        codecName - the requested codec
        Returns:
        the corresponding hadoop codec. null if UNCOMPRESSED
      • release

        public void release()
        Specified by:
        release in interface org.apache.parquet.compression.CompressionCodecFactory