java.lang.Object
- org.apache.parquet.hadoop.ParquetFileWriter

```
public class ParquetFileWriter
extends Object
```
Internal implementation of the Parquet file writer as a block container

Nested Class Summary

Nested Classes
Modifier and Type Class Description

static class ParquetFileWriter.Mode

Field Summary

Fields
Modifier and Type	Field	Description
`static int`	`CURRENT_VERSION`
`static String`	`EF_MAGIC_STR`
`static byte[]`	`EFMAGIC`
`static byte[]`	`MAGIC`
`static String`	`MAGIC_STR`
`protected PositionOutputStream`	`out`
`static String`	`PARQUET_COMMON_METADATA_FILE`
`static String`	`PARQUET_METADATA_FILE`

Constructor Summary

Constructors
Constructor	Description
`ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration, MessageType schema, org.apache.hadoop.fs.Path file)`	Deprecated. will be removed in 2.0.0
`ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration, MessageType schema, org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode)`	Deprecated. will be removed in 2.0.0
`ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration, MessageType schema, org.apache.hadoop.fs.Path file, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize)`	Deprecated. will be removed in 2.0.0
`ParquetFileWriter(OutputFile file, MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize)`	Deprecated. will be removed in 2.0.0
`ParquetFileWriter(OutputFile file, MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, int columnIndexTruncateLength, int statisticsTruncateLength, boolean pageWriteChecksumEnabled)`
`ParquetFileWriter(OutputFile file, MessageType schema, ParquetFileWriter.Mode mode, long rowGroupSize, int maxPaddingSize, int columnIndexTruncateLength, int statisticsTruncateLength, boolean pageWriteChecksumEnabled, FileEncryptionProperties encryptionProperties)`

Method Summary

All Methods Static Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method	Description
`void`	`appendColumnChunk(ColumnDescriptor descriptor, SeekableInputStream from, ColumnChunkMetaData chunk, BloomFilter bloomFilter, ColumnIndex columnIndex, OffsetIndex offsetIndex)`
`void`	`appendFile(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.fs.Path file)`	Deprecated. will be removed in 2.0.0; use `appendFile(InputFile)` instead
`void`	`appendFile(InputFile file)`
`void`	`appendRowGroup(org.apache.hadoop.fs.FSDataInputStream from, BlockMetaData rowGroup, boolean dropColumns)`	Deprecated. will be removed in 2.0.0; use `appendRowGroup(SeekableInputStream,BlockMetaData,boolean)` instead
`void`	`appendRowGroup(SeekableInputStream from, BlockMetaData rowGroup, boolean dropColumns)`
`void`	`appendRowGroups(org.apache.hadoop.fs.FSDataInputStream file, List<BlockMetaData> rowGroups, boolean dropColumns)`	Deprecated. will be removed in 2.0.0; use `appendRowGroups(SeekableInputStream,List,boolean)` instead
`void`	`appendRowGroups(SeekableInputStream file, List<BlockMetaData> rowGroups, boolean dropColumns)`
`void`	`end(Map<String,String> extraMetaData)`	ends a file once all blocks have been written.
`void`	`endBlock()`	ends a block once all column chunks have been written
`void`	`endColumn()`	end a column (once all rep, def and data have been written)
`ParquetMetadata`	`getFooter()`
`long`	`getNextRowGroupSize()`
`long`	`getPos()`
`static ParquetMetadata`	`mergeMetadataFiles(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.conf.Configuration conf)`	Deprecated. metadata files are not recommended and will be removed in 2.0.0
`static ParquetMetadata`	`mergeMetadataFiles(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.conf.Configuration conf, KeyValueMetadataMergeStrategy keyValueMetadataMergeStrategy)`	Deprecated. metadata files are not recommended and will be removed in 2.0.0
`void`	`start()`	start the file
`void`	`startBlock(long recordCount)`	start a block
`void`	`startColumn(ColumnDescriptor descriptor, long valueCount, org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName)`	start a column inside a block
`void`	`writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, Encoding rlEncoding, Encoding dlEncoding, Encoding valuesEncoding)`	Deprecated.
`void`	`writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, Statistics statistics, long rowCount, Encoding rlEncoding, Encoding dlEncoding, Encoding valuesEncoding)`	Writes a single page
`void`	`writeDataPage(int valueCount, int uncompressedPageSize, org.apache.parquet.bytes.BytesInput bytes, Statistics statistics, Encoding rlEncoding, Encoding dlEncoding, Encoding valuesEncoding)`	Deprecated. this method does not support writing column indexes; Use `writeDataPage(int, int, BytesInput, Statistics, long, Encoding, Encoding, Encoding)` instead
`void`	`writeDataPageV2(int rowCount, int nullCount, int valueCount, org.apache.parquet.bytes.BytesInput repetitionLevels, org.apache.parquet.bytes.BytesInput definitionLevels, Encoding dataEncoding, org.apache.parquet.bytes.BytesInput compressedData, int uncompressedDataSize, Statistics<?> statistics)`	Writes a single v2 data page
`void`	`writeDictionaryPage(DictionaryPage dictionaryPage)`	writes a dictionary page page
`void`	`writeDictionaryPage(DictionaryPage dictionaryPage, BlockCipher.Encryptor headerBlockEncryptor, byte[] AAD)`
`static void`	`writeMergedMetadataFile(List<org.apache.hadoop.fs.Path> files, org.apache.hadoop.fs.Path outputPath, org.apache.hadoop.conf.Configuration conf)`	Deprecated. metadata files are not recommended and will be removed in 2.0.0
`static void`	`writeMetadataFile(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path outputPath, List<Footer> footers)`	Deprecated. metadata files are not recommended and will be removed in 2.0.0
`static void`	`writeMetadataFile(org.apache.hadoop.conf.Configuration configuration, org.apache.hadoop.fs.Path outputPath, List<Footer> footers, ParquetOutputFormat.JobSummaryLevel level)`	Deprecated. metadata files are not recommended and will be removed in 2.0.0

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Detail

PARQUET_METADATA_FILE

public static final String PARQUET_METADATA_FILE

See Also:: Constant Field Values

MAGIC_STR
```
public static final String MAGIC_STR
```
See Also:

Constant Field Values

MAGIC
```
public static final byte[] MAGIC
```

EF_MAGIC_STR

public static final String EF_MAGIC_STR

See Also:: Constant Field Values

EFMAGIC
```
public static final byte[] EFMAGIC
```

PARQUET_COMMON_METADATA_FILE

public static final String PARQUET_COMMON_METADATA_FILE

See Also:: Constant Field Values

CURRENT_VERSION
```
public static final int CURRENT_VERSION
```
See Also:

Constant Field Values

out

protected final PositionOutputStream out

Constructor Detail

ParquetFileWriter

@Deprecated
public ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration,
                         MessageType schema,
                         org.apache.hadoop.fs.Path file)
                  throws IOException

Deprecated.

will be removed in 2.0.0

Parameters:: configuration - Hadoop configuration; schema - the schema of the data; file - the file to write to
Throws:: IOException - if the file can not be created

ParquetFileWriter

@Deprecated
public ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration,
                         MessageType schema,
                         org.apache.hadoop.fs.Path file,
                         ParquetFileWriter.Mode mode)
                  throws IOException

Deprecated.

will be removed in 2.0.0

Parameters:: configuration - Hadoop configuration; schema - the schema of the data; file - the file to write to; mode - file creation mode
Throws:: IOException - if the file can not be created

ParquetFileWriter

@Deprecated
public ParquetFileWriter(org.apache.hadoop.conf.Configuration configuration,
                         MessageType schema,
                         org.apache.hadoop.fs.Path file,
                         ParquetFileWriter.Mode mode,
                         long rowGroupSize,
                         int maxPaddingSize)
                  throws IOException

Deprecated.

will be removed in 2.0.0

Parameters:: configuration - Hadoop configuration; schema - the schema of the data; file - the file to write to; mode - file creation mode; rowGroupSize - the row group size; maxPaddingSize - the maximum padding
Throws:: IOException - if the file can not be created

ParquetFileWriter

@Deprecated
public ParquetFileWriter(OutputFile file,
                         MessageType schema,
                         ParquetFileWriter.Mode mode,
                         long rowGroupSize,
                         int maxPaddingSize)
                  throws IOException

Deprecated.

will be removed in 2.0.0

Parameters:: file - OutputFile to create or overwrite; schema - the schema of the data; mode - file creation mode; rowGroupSize - the row group size; maxPaddingSize - the maximum padding
Throws:: IOException - if the file can not be created

ParquetFileWriter
```
public ParquetFileWriter(OutputFile file,
                         MessageType schema,
                         ParquetFileWriter.Mode mode,
                         long rowGroupSize,
                         int maxPaddingSize,
                         int columnIndexTruncateLength,
                         int statisticsTruncateLength,
                         boolean pageWriteChecksumEnabled)
                  throws IOException
```
Parameters:

file - OutputFile to create or overwrite

schema - the schema of the data

mode - file creation mode

rowGroupSize - the row group size

maxPaddingSize - the maximum padding

columnIndexTruncateLength - the length which the min/max values in column indexes tried to be truncated to

statisticsTruncateLength - the length which the min/max values in row groups tried to be truncated to

pageWriteChecksumEnabled - whether to write out page level checksums

Throws:

IOException - if the file can not be created

ParquetFileWriter

public ParquetFileWriter(OutputFile file,
                         MessageType schema,
                         ParquetFileWriter.Mode mode,
                         long rowGroupSize,
                         int maxPaddingSize,
                         int columnIndexTruncateLength,
                         int statisticsTruncateLength,
                         boolean pageWriteChecksumEnabled,
                         FileEncryptionProperties encryptionProperties)
                  throws IOException

Throws:: IOException

Method Detail

start
```
public void start()
           throws IOException
```
start the file

Throws:

IOException - if there is an error while writing

startBlock
```
public void startBlock(long recordCount)
                throws IOException
```
start a block

Parameters:

recordCount - the record count in this block

Throws:

IOException - if there is an error while writing

startColumn

public void startColumn(ColumnDescriptor descriptor,
                        long valueCount,
                        org.apache.parquet.hadoop.metadata.CompressionCodecName compressionCodecName)
                 throws IOException

start a column inside a block

Parameters:: descriptor - the column descriptor; valueCount - the value count in this column; compressionCodecName - a compression codec name
Throws:: IOException - if there is an error while writing

writeDictionaryPage
```
public void writeDictionaryPage(DictionaryPage dictionaryPage)
                         throws IOException
```
writes a dictionary page page

Parameters:

dictionaryPage - the dictionary page

Throws:

IOException - if there is an error while writing

writeDictionaryPage

public void writeDictionaryPage(DictionaryPage dictionaryPage,
                                BlockCipher.Encryptor headerBlockEncryptor,
                                byte[] AAD)
                         throws IOException

Throws:: IOException

writeDataPage

@Deprecated
public void writeDataPage(int valueCount,
                          int uncompressedPageSize,
                          org.apache.parquet.bytes.BytesInput bytes,
                          Encoding rlEncoding,
                          Encoding dlEncoding,
                          Encoding valuesEncoding)
                   throws IOException

Deprecated.

writes a single page

Parameters:: valueCount - count of values; uncompressedPageSize - the size of the data once uncompressed; bytes - the compressed data for the page without header; rlEncoding - encoding of the repetition level; dlEncoding - encoding of the definition level; valuesEncoding - encoding of values
Throws:: IOException - if there is an error while writing

writeDataPage
```
@Deprecated
public void writeDataPage(int valueCount,
                          int uncompressedPageSize,
                          org.apache.parquet.bytes.BytesInput bytes,
                          Statistics statistics,
                          Encoding rlEncoding,
                          Encoding dlEncoding,
                          Encoding valuesEncoding)
                   throws IOException
```
Deprecated.
this method does not support writing column indexes; Use writeDataPage(int, int, BytesInput, Statistics, long, Encoding, Encoding, Encoding) instead

writes a single page

Parameters:

valueCount - count of values

uncompressedPageSize - the size of the data once uncompressed

bytes - the compressed data for the page without header

statistics - statistics for the page

rlEncoding - encoding of the repetition level

dlEncoding - encoding of the definition level

valuesEncoding - encoding of values

Throws:

IOException - if there is an error while writing

writeDataPage

public void writeDataPage(int valueCount,
                          int uncompressedPageSize,
                          org.apache.parquet.bytes.BytesInput bytes,
                          Statistics statistics,
                          long rowCount,
                          Encoding rlEncoding,
                          Encoding dlEncoding,
                          Encoding valuesEncoding)
                   throws IOException

Writes a single page

Parameters:: valueCount - count of values; uncompressedPageSize - the size of the data once uncompressed; bytes - the compressed data for the page without header; statistics - the statistics of the page; rowCount - the number of rows in the page; rlEncoding - encoding of the repetition level; dlEncoding - encoding of the definition level; valuesEncoding - encoding of values
Throws:: IOException - if any I/O error occurs during writing the file

writeDataPageV2

public void writeDataPageV2(int rowCount,
                            int nullCount,
                            int valueCount,
                            org.apache.parquet.bytes.BytesInput repetitionLevels,
                            org.apache.parquet.bytes.BytesInput definitionLevels,
                            Encoding dataEncoding,
                            org.apache.parquet.bytes.BytesInput compressedData,
                            int uncompressedDataSize,
                            Statistics<?> statistics)
                     throws IOException

Writes a single v2 data page

Parameters:: rowCount - count of rows; nullCount - count of nulls; valueCount - count of values; repetitionLevels - repetition level bytes; definitionLevels - definition level bytes; dataEncoding - encoding for data; compressedData - compressed data bytes; uncompressedDataSize - the size of uncompressed data; statistics - the statistics of the page
Throws:: IOException - if any I/O error occurs during writing the file

endColumn
```
public void endColumn()
               throws IOException
```
end a column (once all rep, def and data have been written)

Throws:

IOException - if there is an error while writing

endBlock
```
public void endBlock()
              throws IOException
```
ends a block once all column chunks have been written

Throws:

IOException - if there is an error while writing

appendFile
```
@Deprecated
public void appendFile(org.apache.hadoop.conf.Configuration conf,
                       org.apache.hadoop.fs.Path file)
                throws IOException
```
Deprecated.
will be removed in 2.0.0; use appendFile(InputFile) instead

Parameters:

conf - a configuration

file - a file path to append the contents of to this file

Throws:

IOException - if there is an error while reading or writing

appendFile

public void appendFile(InputFile file)
                throws IOException

Throws:: IOException

appendRowGroups
```
@Deprecated
public void appendRowGroups(org.apache.hadoop.fs.FSDataInputStream file,
                            List<BlockMetaData> rowGroups,
                            boolean dropColumns)
                     throws IOException
```
Deprecated.
will be removed in 2.0.0; use appendRowGroups(SeekableInputStream,List,boolean) instead

Parameters:

file - a file stream to read from

rowGroups - row groups to copy

dropColumns - whether to drop columns from the file that are not in this file's schema

Throws:

IOException - if there is an error while reading or writing

appendRowGroups

public void appendRowGroups(SeekableInputStream file,
                            List<BlockMetaData> rowGroups,
                            boolean dropColumns)
                     throws IOException

Throws:: IOException

appendRowGroup
```
@Deprecated
public void appendRowGroup(org.apache.hadoop.fs.FSDataInputStream from,
                           BlockMetaData rowGroup,
                           boolean dropColumns)
                    throws IOException
```
Deprecated.
will be removed in 2.0.0; use appendRowGroup(SeekableInputStream,BlockMetaData,boolean) instead

Parameters:

from - a file stream to read from

rowGroup - row group to copy

dropColumns - whether to drop columns from the file that are not in this file's schema

Throws:

IOException - if there is an error while reading or writing

appendRowGroup

public void appendRowGroup(SeekableInputStream from,
                           BlockMetaData rowGroup,
                           boolean dropColumns)
                    throws IOException

Throws:: IOException

appendColumnChunk

public void appendColumnChunk(ColumnDescriptor descriptor,
                              SeekableInputStream from,
                              ColumnChunkMetaData chunk,
                              BloomFilter bloomFilter,
                              ColumnIndex columnIndex,
                              OffsetIndex offsetIndex)
                       throws IOException

Parameters:: descriptor - the descriptor for the target column; from - a file stream to read from; chunk - the column chunk to be copied; bloomFilter - the bloomFilter for this chunk; columnIndex - the column index for this chunk; offsetIndex - the offset index for this chunk
Throws:: IOException

end
```
public void end(Map<String,String> extraMetaData)
         throws IOException
```
ends a file once all blocks have been written. closes the file.

Parameters:

extraMetaData - the extra meta data to write in the footer

Throws:

IOException - if there is an error while writing

getFooter
```
public ParquetMetadata getFooter()
```

mergeMetadataFiles
```
@Deprecated
public static ParquetMetadata mergeMetadataFiles(List<org.apache.hadoop.fs.Path> files,
                                                 org.apache.hadoop.conf.Configuration conf)
                                          throws IOException
```
Deprecated.
metadata files are not recommended and will be removed in 2.0.0

Given a list of metadata files, merge them into a single ParquetMetadata Requires that the schemas be compatible, and the extraMetadata be exactly equal.

Parameters:

files - a list of files to merge metadata from

conf - a configuration

Returns:

merged parquet metadata for the files

Throws:

IOException - if there is an error while writing

mergeMetadataFiles
```
@Deprecated
public static ParquetMetadata mergeMetadataFiles(List<org.apache.hadoop.fs.Path> files,
                                                 org.apache.hadoop.conf.Configuration conf,
                                                 KeyValueMetadataMergeStrategy keyValueMetadataMergeStrategy)
                                          throws IOException
```
Deprecated.
metadata files are not recommended and will be removed in 2.0.0

Given a list of metadata files, merge them into a single ParquetMetadata Requires that the schemas be compatible, and the extraMetadata be exactly equal.

Parameters:

files - a list of files to merge metadata from

conf - a configuration

keyValueMetadataMergeStrategy - strategy to merge values for same key, if there are multiple

Returns:

merged parquet metadata for the files

Throws:

IOException - if there is an error while writing

writeMergedMetadataFile
```
@Deprecated
public static void writeMergedMetadataFile(List<org.apache.hadoop.fs.Path> files,
                                           org.apache.hadoop.fs.Path outputPath,
                                           org.apache.hadoop.conf.Configuration conf)
                                    throws IOException
```
Deprecated.
metadata files are not recommended and will be removed in 2.0.0

Given a list of metadata files, merge them into a single metadata file. Requires that the schemas be compatible, and the extraMetaData be exactly equal. This is useful when merging 2 directories of parquet files into a single directory, as long as both directories were written with compatible schemas and equal extraMetaData.

Parameters:

files - a list of files to merge metadata from

outputPath - path to write merged metadata to

conf - a configuration

Throws:

IOException - if there is an error while reading or writing

writeMetadataFile
```
@Deprecated
public static void writeMetadataFile(org.apache.hadoop.conf.Configuration configuration,
                                     org.apache.hadoop.fs.Path outputPath,
                                     List<Footer> footers)
                              throws IOException
```
Deprecated.
metadata files are not recommended and will be removed in 2.0.0

writes a _metadata and _common_metadata file

Parameters:

configuration - the configuration to use to get the FileSystem

outputPath - the directory to write the _metadata file to

footers - the list of footers to merge

Throws:

IOException - if there is an error while writing

writeMetadataFile
```
@Deprecated
public static void writeMetadataFile(org.apache.hadoop.conf.Configuration configuration,
                                     org.apache.hadoop.fs.Path outputPath,
                                     List<Footer> footers,
                                     ParquetOutputFormat.JobSummaryLevel level)
                              throws IOException
```
Deprecated.
metadata files are not recommended and will be removed in 2.0.0

writes _common_metadata file, and optionally a _metadata file depending on the ParquetOutputFormat.JobSummaryLevel provided

Parameters:

configuration - the configuration to use to get the FileSystem

outputPath - the directory to write the _metadata file to

footers - the list of footers to merge

level - level of summary to write

Throws:

IOException - if there is an error while writing

getPos
```
public long getPos()
            throws IOException
```
Returns:

the current position in the underlying file

Throws:

IOException - if there is an error while getting the current stream's position

getNextRowGroupSize

public long getNextRowGroupSize()
                         throws IOException

Throws:: IOException

Class ParquetFileWriter

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

PARQUET_METADATA_FILE

MAGIC_STR

MAGIC

EF_MAGIC_STR

EFMAGIC

PARQUET_COMMON_METADATA_FILE

CURRENT_VERSION

out

Constructor Detail

ParquetFileWriter

ParquetFileWriter

ParquetFileWriter

ParquetFileWriter

ParquetFileWriter

ParquetFileWriter

Method Detail

start

startBlock

startColumn

writeDictionaryPage

writeDictionaryPage

writeDataPage

writeDataPage

writeDataPage

writeDataPageV2

endColumn

endBlock

appendFile

appendFile

appendRowGroups

appendRowGroups

appendRowGroup

appendRowGroup

appendColumnChunk

end

getFooter

mergeMetadataFiles

mergeMetadataFiles

writeMergedMetadataFile

writeMetadataFile

writeMetadataFile

getPos

getNextRowGroupSize