Class ParquetSerDe

    • Method Detail

      • blockSizeBytes

        public final Integer blockSizeBytes()

        The Hadoop Distributed File System (HDFS) block size. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. The default is 256 MiB and the minimum is 64 MiB. Kinesis Data Firehose uses this value for padding calculations.

        Returns:
        The Hadoop Distributed File System (HDFS) block size. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. The default is 256 MiB and the minimum is 64 MiB. Kinesis Data Firehose uses this value for padding calculations.
      • pageSizeBytes

        public final Integer pageSizeBytes()

        The Parquet page size. Column chunks are divided into pages. A page is conceptually an indivisible unit (in terms of compression and encoding). The minimum value is 64 KiB and the default is 1 MiB.

        Returns:
        The Parquet page size. Column chunks are divided into pages. A page is conceptually an indivisible unit (in terms of compression and encoding). The minimum value is 64 KiB and the default is 1 MiB.
      • compression

        public final ParquetCompression compression()

        The compression code to use over data blocks. The possible values are UNCOMPRESSED, SNAPPY, and GZIP, with the default being SNAPPY. Use SNAPPY for higher decompression speed. Use GZIP if the compression ratio is more important than speed.

        If the service returns an enum value that is not available in the current SDK version, compression will return ParquetCompression.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available from compressionAsString().

        Returns:
        The compression code to use over data blocks. The possible values are UNCOMPRESSED, SNAPPY, and GZIP, with the default being SNAPPY. Use SNAPPY for higher decompression speed. Use GZIP if the compression ratio is more important than speed.
        See Also:
        ParquetCompression
      • compressionAsString

        public final String compressionAsString()

        The compression code to use over data blocks. The possible values are UNCOMPRESSED, SNAPPY, and GZIP, with the default being SNAPPY. Use SNAPPY for higher decompression speed. Use GZIP if the compression ratio is more important than speed.

        If the service returns an enum value that is not available in the current SDK version, compression will return ParquetCompression.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available from compressionAsString().

        Returns:
        The compression code to use over data blocks. The possible values are UNCOMPRESSED, SNAPPY, and GZIP, with the default being SNAPPY. Use SNAPPY for higher decompression speed. Use GZIP if the compression ratio is more important than speed.
        See Also:
        ParquetCompression
      • enableDictionaryCompression

        public final Boolean enableDictionaryCompression()

        Indicates whether to enable dictionary compression.

        Returns:
        Indicates whether to enable dictionary compression.
      • maxPaddingBytes

        public final Integer maxPaddingBytes()

        The maximum amount of padding to apply. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. The default is 0.

        Returns:
        The maximum amount of padding to apply. This is useful if you intend to copy the data from Amazon S3 to HDFS before querying. The default is 0.
      • writerVersionAsString

        public final String writerVersionAsString()

        Indicates the version of row format to output. The possible values are V1 and V2. The default is V1.

        If the service returns an enum value that is not available in the current SDK version, writerVersion will return ParquetWriterVersion.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available from writerVersionAsString().

        Returns:
        Indicates the version of row format to output. The possible values are V1 and V2. The default is V1.
        See Also:
        ParquetWriterVersion
      • hashCode

        public final int hashCode()
        Overrides:
        hashCode in class Object
      • equals

        public final boolean equals​(Object obj)
        Overrides:
        equals in class Object
      • toString

        public final String toString()
        Returns a string representation of this object. This is useful for testing and debugging. Sensitive data will be redacted from this string using a placeholder value.
        Overrides:
        toString in class Object
      • getValueForField

        public final <T> Optional<T> getValueForField​(String fieldName,
                                                      Class<T> clazz)