T
- the Java type of objects produced by this InputFormatpublic class AvroParquetInputFormat<T> extends ParquetInputFormat<T>
InputFormat
for Parquet files.BLOOM_FILTERING_ENABLED, COLUMN_INDEX_FILTERING_ENABLED, DICTIONARY_FILTERING_ENABLED, FILTER_PREDICATE, HADOOP_VECTORED_IO_DEFAULT, HADOOP_VECTORED_IO_ENABLED, OFF_HEAP_DECRYPT_BUFFER_ENABLED, PAGE_VERIFY_CHECKSUM_ENABLED, READ_SUPPORT_CLASS, RECORD_FILTERING_ENABLED, SPLIT_FILES, STATS_FILTERING_ENABLED, STRICT_TYPE_CHECKING, TASK_SIDE_METADATA, UNBOUND_RECORD_FILTER
Constructor and Description |
---|
AvroParquetInputFormat() |
Modifier and Type | Method and Description |
---|---|
static void |
setAvroDataSupplier(org.apache.hadoop.mapreduce.Job job,
Class<? extends AvroDataSupplier> supplierClass)
Uses an instance of the specified
AvroDataSupplier class to control how the
SpecificData instance that is used to find
Avro specific records is created. |
static void |
setAvroReadSchema(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema avroReadSchema)
Override the Avro schema to use for reading.
|
static void |
setRequestedProjection(org.apache.hadoop.mapreduce.Job job,
org.apache.avro.Schema requestedProjection)
Set the subset of columns to read (projection pushdown).
|
createRecordReader, getFilter, getFilter, getFooters, getFooters, getFooters, getGlobalMetaData, getReadSupportClass, getReadSupportInstance, getSplits, getSplits, getUnboundRecordFilter, isSplitable, isTaskSideMetaData, listStatus, setFilterPredicate, setReadSupportClass, setReadSupportClass, setTaskSideMetaData, setUnboundRecordFilter
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize
public static void setRequestedProjection(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema requestedProjection)
This is useful if the full schema is large and you only want to read a few columns, since it saves time by not reading unused columns.
If a requested projection is set, then the Avro schema used for reading
must be compatible with the projection. For instance, if a column is not included
in the projection then it must either not be included or be optional in the read
schema. Use setAvroReadSchema(org.apache.hadoop.mapreduce.Job,
org.apache.avro.Schema)
to set a read schema, if needed.
job
- a jobrequestedProjection
- the requested projection schemasetAvroReadSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
,
AvroParquetOutputFormat.setSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
public static void setAvroReadSchema(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema avroReadSchema)
Differences between the read and write schemas are resolved using Avro's schema resolution rules.
job
- a jobavroReadSchema
- the requested schemasetRequestedProjection(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
,
AvroParquetOutputFormat.setSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
public static void setAvroDataSupplier(org.apache.hadoop.mapreduce.Job job, Class<? extends AvroDataSupplier> supplierClass)
AvroDataSupplier
class to control how the
SpecificData
instance that is used to find
Avro specific records is created.job
- a jobsupplierClass
- an avro data supplier classCopyright © 2023 The Apache Software Foundation. All rights reserved.