An output committer for writing Parquet files. In stead of writing to the _temporary folder
like what parquet.hadoop.ParquetOutputCommitter does, this output committer writes data directly to the
destination folder. This can be useful for data stored in S3, where directory operations are
relatively expensive.
To enable this output committer, users may set the "spark.sql.parquet.output.committer.class"
property via Hadoop org.apache.hadoop.conf.Configuration. Not that this property overrides
"spark.sql.sources.outputCommitterClass".
*NOTE*
NEVER use DirectParquetOutputCommitter when appending data, because currently there's
no safe way undo a failed appending job (that's why both abortTask() and abortJob() are
left empty).
Linear Supertypes
ParquetOutputCommitter, FileOutputCommitter, OutputCommitter, AnyRef, Any
An output committer for writing Parquet files. In stead of writing to the
_temporary
folder like what parquet.hadoop.ParquetOutputCommitter does, this output committer writes data directly to the destination folder. This can be useful for data stored in S3, where directory operations are relatively expensive.To enable this output committer, users may set the "spark.sql.parquet.output.committer.class" property via Hadoop org.apache.hadoop.conf.Configuration. Not that this property overrides "spark.sql.sources.outputCommitterClass".
*NOTE*
NEVER use DirectParquetOutputCommitter when appending data, because currently there's no safe way undo a failed appending job (that's why both
abortTask()
andabortJob()
are left empty).