AvroParquetInputFormat (Apache Parquet Avro 1.14.4 API)

java.lang.Object
- org.apache.hadoop.mapreduce.InputFormat<K,V>
- - org.apache.hadoop.mapreduce.lib.input.FileInputFormat<Void,T>
  - - org.apache.parquet.hadoop.ParquetInputFormat<T>
    - - org.apache.parquet.avro.AvroParquetInputFormat<T>

Type Parameters:

T - the Java type of objects produced by this InputFormat
```
public class AvroParquetInputFormat<T>
extends ParquetInputFormat<T>
```
A Hadoop InputFormat for Parquet files.

Nested Class Summary
- Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
  org.apache.hadoop.mapreduce.lib.input.FileInputFormat.Counter

Field Summary
- Fields inherited from class org.apache.parquet.hadoop.ParquetInputFormat
  BLOOM_FILTERING_ENABLED, COLUMN_INDEX_FILTERING_ENABLED, DICTIONARY_FILTERING_ENABLED, FILTER_PREDICATE, HADOOP_VECTORED_IO_DEFAULT, HADOOP_VECTORED_IO_ENABLED, OFF_HEAP_DECRYPT_BUFFER_ENABLED, PAGE_VERIFY_CHECKSUM_ENABLED, READ_SUPPORT_CLASS, RECORD_FILTERING_ENABLED, SPLIT_FILES, STATS_FILTERING_ENABLED, STRICT_TYPE_CHECKING, TASK_SIDE_METADATA, UNBOUND_RECORD_FILTER
- Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
  DEFAULT_LIST_STATUS_NUM_THREADS, INPUT_DIR, INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LIST_STATUS_NUM_THREADS, NUM_INPUT_FILES, PATHFILTER_CLASS, SPLIT_MAXSIZE, SPLIT_MINSIZE

Constructor Summary

Constructors
Constructor and Description

AvroParquetInputFormat()

Constructors
Constructor and Description
`AvroParquetInputFormat()`

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method and Description
`static void`	`setAvroDataSupplier(org.apache.hadoop.mapreduce.Job job, Class<? extends AvroDataSupplier> supplierClass)` Uses an instance of the specified `AvroDataSupplier` class to control how the `SpecificData` instance that is used to find Avro specific records is created.
`static void`	`setAvroReadSchema(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema avroReadSchema)` Override the Avro schema to use for reading.
`static void`	`setRequestedProjection(org.apache.hadoop.mapreduce.Job job, org.apache.avro.Schema requestedProjection)` Set the subset of columns to read (projection pushdown).

Methods inherited from class org.apache.parquet.hadoop.ParquetInputFormat
createRecordReader, getFilter, getFilter, getFooters, getFooters, getFooters, getGlobalMetaData, getReadSupportClass, getReadSupportInstance, getSplits, getSplits, getUnboundRecordFilter, isSplitable, isTaskSideMetaData, listStatus, setFilterPredicate, setReadSupportClass, setReadSupportClass, setTaskSideMetaData, setUnboundRecordFilter

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getFormatMinSplitSize, getInputDirRecursive, getInputPathFilter, getInputPaths, getMaxSplitSize, getMinSplitSize, makeSplit, makeSplit, setInputDirRecursive, setInputPathFilter, setInputPaths, setInputPaths, setMaxInputSplitSize, setMinInputSplitSize

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - AvroParquetInputFormat
```
public AvroParquetInputFormat()
```
- Method Detail
  - setRequestedProjection
```
public static void setRequestedProjection(org.apache.hadoop.mapreduce.Job job,
                                          org.apache.avro.Schema requestedProjection)
```
    Set the subset of columns to read (projection pushdown). Specified as an Avro schema, the requested projection is converted into a Parquet schema for Parquet column projection.
    This is useful if the full schema is large and you only want to read a few columns, since it saves time by not reading unused columns.
    If a requested projection is set, then the Avro schema used for reading must be compatible with the projection. For instance, if a column is not included in the projection then it must either not be included or be optional in the read schema. Use setAvroReadSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema) to set a read schema, if needed.
    
    Parameters:
    
    job - a job
    
    requestedProjection - the requested projection schema
    
    See Also:
    
    setAvroReadSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema), AvroParquetOutputFormat.setSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
  - setAvroReadSchema
```
public static void setAvroReadSchema(org.apache.hadoop.mapreduce.Job job,
                                     org.apache.avro.Schema avroReadSchema)
```
    Override the Avro schema to use for reading. If not set, the Avro schema used for writing is used.
    Differences between the read and write schemas are resolved using Avro's schema resolution rules.
    
    Parameters:
    
    job - a job
    
    avroReadSchema - the requested schema
    
    See Also:
    
    setRequestedProjection(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema), AvroParquetOutputFormat.setSchema(org.apache.hadoop.mapreduce.Job, org.apache.avro.Schema)
  - setAvroDataSupplier
```
public static void setAvroDataSupplier(org.apache.hadoop.mapreduce.Job job,
                                       Class<? extends AvroDataSupplier> supplierClass)
```
    Uses an instance of the specified AvroDataSupplier class to control how the SpecificData instance that is used to find Avro specific records is created.
    
    Parameters:
    
    job - a job
    
    supplierClass - an avro data supplier class

Class AvroParquetInputFormat<T>

Nested Class Summary

Nested classes/interfaces inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Field Summary

Fields inherited from class org.apache.parquet.hadoop.ParquetInputFormat

Fields inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Constructor Summary

Method Summary

Methods inherited from class org.apache.parquet.hadoop.ParquetInputFormat

Methods inherited from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat

Methods inherited from class java.lang.Object

Constructor Detail

AvroParquetInputFormat

Method Detail

setRequestedProjection

setAvroReadSchema

setAvroDataSupplier