Package org.apache.parquet.hadoop
Class ParquetWriter.Builder<T,SELF extends ParquetWriter.Builder<T,SELF>>
- java.lang.Object
-
- org.apache.parquet.hadoop.ParquetWriter.Builder<T,SELF>
-
- Type Parameters:
T
- The type of objects written by the constructed ParquetWriter.SELF
- The type of this builder that is returned by builder methods
- Direct Known Subclasses:
ExampleParquetWriter.Builder
- Enclosing class:
- ParquetWriter<T>
public abstract static class ParquetWriter.Builder<T,SELF extends ParquetWriter.Builder<T,SELF>> extends Object
An abstract builder class for ParquetWriter instances. Object models should extend this builder to provide writer configuration options.
-
-
Constructor Summary
Constructors Modifier Constructor Description protected
Builder(org.apache.hadoop.fs.Path path)
protected
Builder(OutputFile path)
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description ParquetWriter<T>
build()
Build aParquetWriter
with the accumulated configuration.SELF
config(String property, String value)
Set a property that will be available to the read path.SELF
enableDictionaryEncoding()
Enables dictionary encoding for the constructed writer.SELF
enablePageWriteChecksum()
Enables writing page level checksums for the constructed writer.SELF
enableValidation()
Enables validation for the constructed writer.protected abstract WriteSupport<T>
getWriteSupport(org.apache.hadoop.conf.Configuration conf)
protected abstract SELF
self()
SELF
withBloomFilterEnabled(boolean enabled)
Sets the bloom filter enabled/disabledSELF
withBloomFilterEnabled(String columnPath, boolean enabled)
Sets the bloom filter enabled/disabled for the specified column.SELF
withBloomFilterNDV(String columnPath, long ndv)
Sets the NDV (number of distinct values) for the specified column.SELF
withByteStreamSplitEncoding(boolean enableByteStreamSplit)
SELF
withCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
Set thecompression codec
used by the constructed writer.SELF
withConf(org.apache.hadoop.conf.Configuration conf)
Set theConfiguration
used by the constructed writer.SELF
withDictionaryEncoding(boolean enableDictionary)
Enable or disable dictionary encoding for the constructed writer.SELF
withDictionaryEncoding(String columnPath, boolean enableDictionary)
Enable or disable dictionary encoding of the specified column for the constructed writer.SELF
withDictionaryPageSize(int dictionaryPageSize)
Set the Parquet format dictionary page size used by the constructed writer.SELF
withEncryption(FileEncryptionProperties encryptionProperties)
Set thefile encryption properties
used by the constructed writer.SELF
withMaxPaddingSize(int maxPaddingSize)
Set the maximum amount of padding, in bytes, that will be used to align row groups with blocks in the underlying filesystem.SELF
withPageRowCountLimit(int rowCount)
Sets the Parquet format page row count limit used by the constructed writer.SELF
withPageSize(int pageSize)
Set the Parquet format page size used by the constructed writer.SELF
withPageWriteChecksumEnabled(boolean enablePageWriteChecksum)
Enables writing page level checksums for the constructed writer.SELF
withRowGroupSize(int rowGroupSize)
Set the Parquet format row group size used by the constructed writer.SELF
withValidation(boolean enableValidation)
Enable or disable validation for the constructed writer.SELF
withWriteMode(ParquetFileWriter.Mode mode)
Set thewrite mode
used when creating the backing file for this writer.SELF
withWriterVersion(ParquetProperties.WriterVersion version)
Set theformat version
used by the constructed writer.
-
-
-
Constructor Detail
-
Builder
protected Builder(org.apache.hadoop.fs.Path path)
-
Builder
protected Builder(OutputFile path)
-
-
Method Detail
-
self
protected abstract SELF self()
- Returns:
- this as the correct subclass of ParquetWriter.Builder.
-
getWriteSupport
protected abstract WriteSupport<T> getWriteSupport(org.apache.hadoop.conf.Configuration conf)
- Parameters:
conf
- a configuration- Returns:
- an appropriate WriteSupport for the object model.
-
withConf
public SELF withConf(org.apache.hadoop.conf.Configuration conf)
Set theConfiguration
used by the constructed writer.- Parameters:
conf
- aConfiguration
- Returns:
- this builder for method chaining.
-
withWriteMode
public SELF withWriteMode(ParquetFileWriter.Mode mode)
Set thewrite mode
used when creating the backing file for this writer.- Parameters:
mode
- aParquetFileWriter.Mode
- Returns:
- this builder for method chaining.
-
withCompressionCodec
public SELF withCompressionCodec(org.apache.parquet.hadoop.metadata.CompressionCodecName codecName)
Set thecompression codec
used by the constructed writer.- Parameters:
codecName
- aCompressionCodecName
- Returns:
- this builder for method chaining.
-
withEncryption
public SELF withEncryption(FileEncryptionProperties encryptionProperties)
Set thefile encryption properties
used by the constructed writer.- Parameters:
encryptionProperties
- aFileEncryptionProperties
- Returns:
- this builder for method chaining.
-
withRowGroupSize
public SELF withRowGroupSize(int rowGroupSize)
Set the Parquet format row group size used by the constructed writer.- Parameters:
rowGroupSize
- an integer size in bytes- Returns:
- this builder for method chaining.
-
withPageSize
public SELF withPageSize(int pageSize)
Set the Parquet format page size used by the constructed writer.- Parameters:
pageSize
- an integer size in bytes- Returns:
- this builder for method chaining.
-
withPageRowCountLimit
public SELF withPageRowCountLimit(int rowCount)
Sets the Parquet format page row count limit used by the constructed writer.- Parameters:
rowCount
- limit for the number of rows stored in a page- Returns:
- this builder for method chaining
-
withDictionaryPageSize
public SELF withDictionaryPageSize(int dictionaryPageSize)
Set the Parquet format dictionary page size used by the constructed writer.- Parameters:
dictionaryPageSize
- an integer size in bytes- Returns:
- this builder for method chaining.
-
withMaxPaddingSize
public SELF withMaxPaddingSize(int maxPaddingSize)
Set the maximum amount of padding, in bytes, that will be used to align row groups with blocks in the underlying filesystem. If the underlying filesystem is not a block filesystem like HDFS, this has no effect.- Parameters:
maxPaddingSize
- an integer size in bytes- Returns:
- this builder for method chaining.
-
enableDictionaryEncoding
public SELF enableDictionaryEncoding()
Enables dictionary encoding for the constructed writer.- Returns:
- this builder for method chaining.
-
withDictionaryEncoding
public SELF withDictionaryEncoding(boolean enableDictionary)
Enable or disable dictionary encoding for the constructed writer.- Parameters:
enableDictionary
- whether dictionary encoding should be enabled- Returns:
- this builder for method chaining.
-
withByteStreamSplitEncoding
public SELF withByteStreamSplitEncoding(boolean enableByteStreamSplit)
-
withDictionaryEncoding
public SELF withDictionaryEncoding(String columnPath, boolean enableDictionary)
Enable or disable dictionary encoding of the specified column for the constructed writer.- Parameters:
columnPath
- the path of the column (dot-string)enableDictionary
- whether dictionary encoding should be enabled- Returns:
- this builder for method chaining.
-
enableValidation
public SELF enableValidation()
Enables validation for the constructed writer.- Returns:
- this builder for method chaining.
-
withValidation
public SELF withValidation(boolean enableValidation)
Enable or disable validation for the constructed writer.- Parameters:
enableValidation
- whether validation should be enabled- Returns:
- this builder for method chaining.
-
withWriterVersion
public SELF withWriterVersion(ParquetProperties.WriterVersion version)
Set theformat version
used by the constructed writer.- Parameters:
version
- aWriterVersion
- Returns:
- this builder for method chaining.
-
enablePageWriteChecksum
public SELF enablePageWriteChecksum()
Enables writing page level checksums for the constructed writer.- Returns:
- this builder for method chaining.
-
withPageWriteChecksumEnabled
public SELF withPageWriteChecksumEnabled(boolean enablePageWriteChecksum)
Enables writing page level checksums for the constructed writer.- Parameters:
enablePageWriteChecksum
- whether page checksums should be written out- Returns:
- this builder for method chaining.
-
withBloomFilterNDV
public SELF withBloomFilterNDV(String columnPath, long ndv)
Sets the NDV (number of distinct values) for the specified column.- Parameters:
columnPath
- the path of the column (dot-string)ndv
- the NDV of the column- Returns:
- this builder for method chaining.
-
withBloomFilterEnabled
public SELF withBloomFilterEnabled(boolean enabled)
Sets the bloom filter enabled/disabled- Parameters:
enabled
- whether to write bloom filters- Returns:
- this builder for method chaining
-
withBloomFilterEnabled
public SELF withBloomFilterEnabled(String columnPath, boolean enabled)
Sets the bloom filter enabled/disabled for the specified column. If not set for the column specifically the default enabled/disabled state will take place. SeewithBloomFilterEnabled(boolean)
.- Parameters:
columnPath
- the path of the column (dot-string)enabled
- whether to write bloom filter for the column- Returns:
- this builder for method chaining
-
config
public SELF config(String property, String value)
Set a property that will be available to the read path. For writers that use a Hadoop configuration, this is the recommended way to add configuration values.- Parameters:
property
- a String property namevalue
- a String property value- Returns:
- this builder for method chaining.
-
build
public ParquetWriter<T> build() throws IOException
Build aParquetWriter
with the accumulated configuration.- Returns:
- a configured
ParquetWriter
instance. - Throws:
IOException
- if there is an error while creating the writer
-
-