Package org.apache.parquet.avro
Provides classes to store Avro data in Parquet files. Avro schemas are converted to parquet schemas as follows. Only record schemas are converted, other top-level schema types are not converted and attempting to do so will result in an error. Avro types are converted to Parquet types using the mapping shown here:
Avro type | Parquet type |
---|---|
null | no type (the field is not encoded in Parquet), unless a null union |
boolean | boolean |
int | int32 |
long | int64 |
float | float |
double | double |
bytes | binary |
string | binary (with original type UTF8) |
record | group containing nested fields |
enum | binary (with original type ENUM) |
array | group (with original type LIST) containing one repeated group field |
map | group (with original type MAP) containing one repeated group field (with original type MAP_KEY_VALUE) of (key, value) |
fixed | fixed_len_byte_array |
union | an optional type, in the case of a null union, otherwise not supported |
For Parquet files that were not written with classes from this package there is no Avro write schema stored in the Parquet file metadata. To read such files using classes from this package you must either provide an Avro read schema, or a default Avro schema will be derived using the following mapping.
Parquet type | Avro type |
---|---|
boolean | boolean |
int32 | int |
int64 | long |
int96 | not supported |
float | float |
double | double |
fixed_len_byte_array | fixed |
binary (with no original type) | bytes |
binary (with original type UTF8) | string |
binary (with original type ENUM) | string |
group (with original type LIST) containing one repeated group field | array |
group (with original type MAP) containing one repeated group field (with original type MAP_KEY_VALUE) of (key, value) | map |
Parquet fields that are optional are mapped to an Avro null union.
Some conversions are lossy. Avro nulls are not represented in Parquet, so they are lost when converted back to Avro. Similarly, a Parquet enum does not store its values, so it cannot be converted back to an Avro enum, which is why an Avro string had to suffice. Type names for nested records, enums, and fixed types are lost in the conversion to Parquet. Avro aliases, default values, field ordering, and documentation strings are all dropped in the conversion to Parquet. Parquet maps can have any type for keys, but this is not true in Avro where map keys are assumed to be strings.
-
Interface Summary Interface Description AvroDataSupplier Allows clients to control how the classes associated with specific Avro records are managed and found, e.g., by creating an instance ofGenericData
that uses a particularClassLoader
.Cars Cars.Callback -
Class Summary Class Description AvroConverters AvroConverters.AvroGroupConverter AvroParquetInputFormat<T> A HadoopInputFormat
for Parquet files.AvroParquetOutputFormat<T> A HadoopOutputFormat
for Parquet files.AvroParquetReader<T> Read Avro records from a Parquet file.AvroParquetReader.Builder<T> AvroParquetWriter<T> Write Avro records to a Parquet file.AvroParquetWriter.Builder<T> AvroReadSupport<T> Avro implementation ofReadSupport
for avro generic, specific, and reflect models.AvroSchemaConverter Converts an Avro schema into a Parquet schema, or vice versa.AvroWriteSupport<T> Avro implementation ofWriteSupport
for generic, specific, and reflect models.Car Car.Builder RecordBuilder for Car instances.Engine Engine.Builder RecordBuilder for Engine instances.GenericDataSupplier LeatherTrim LeatherTrim.Builder RecordBuilder for LeatherTrim instances.NewCar NewCar.Builder RecordBuilder for NewCar instances.ReflectDataSupplier Service Service.Builder RecordBuilder for Service instances.ShortCar ShortCar.Builder RecordBuilder for ShortCar instances.SpecificDataSupplier Stereo Stereo.Builder RecordBuilder for Stereo instances.StringBehaviorTest StringBehaviorTest.Builder RecordBuilder for StringBehaviorTest instances.Vin -
Enum Summary Enum Description EngineType