Package org.apache.parquet.hadoop.api
Class ReadSupport<T>
- java.lang.Object
-
- org.apache.parquet.hadoop.api.ReadSupport<T>
-
- Type Parameters:
T
- the type of the materialized record
- Direct Known Subclasses:
DelegatingReadSupport
,GroupReadSupport
public abstract class ReadSupport<T> extends Object
Abstraction used by theParquetInputFormat
to materialize records
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ReadSupport.ReadContext
information to read the file
-
Field Summary
Fields Modifier and Type Field Description static String
PARQUET_READ_SCHEMA
configuration key for a parquet read projection schema
-
Constructor Summary
Constructors Constructor Description ReadSupport()
-
Method Summary
All Methods Static Methods Instance Methods Abstract Methods Concrete Methods Deprecated Methods Modifier and Type Method Description static MessageType
getSchemaForRead(MessageType fileMessageType, String partialReadSchemaString)
attempts to validate and construct aMessageType
from a read projection schemastatic MessageType
getSchemaForRead(MessageType fileMessageType, MessageType projectedMessageType)
ReadSupport.ReadContext
init(org.apache.hadoop.conf.Configuration configuration, Map<String,String> keyValueMetaData, MessageType fileSchema)
Deprecated.overrideinit(InitContext)
insteadReadSupport.ReadContext
init(InitContext context)
called inInputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext)
in the front endabstract RecordMaterializer<T>
prepareForRead(org.apache.hadoop.conf.Configuration configuration, Map<String,String> keyValueMetaData, MessageType fileSchema, ReadSupport.ReadContext readContext)
called inRecordReader.initialize(org.apache.hadoop.mapreduce.InputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext)
in the back end the returned RecordMaterializer will materialize the records and add them to the destination
-
-
-
Field Detail
-
PARQUET_READ_SCHEMA
public static final String PARQUET_READ_SCHEMA
configuration key for a parquet read projection schema- See Also:
- Constant Field Values
-
-
Method Detail
-
getSchemaForRead
public static MessageType getSchemaForRead(MessageType fileMessageType, String partialReadSchemaString)
attempts to validate and construct aMessageType
from a read projection schema- Parameters:
fileMessageType
- the typed schema of the sourcepartialReadSchemaString
- the requested projection schema- Returns:
- the typed schema that should be used to read
-
getSchemaForRead
public static MessageType getSchemaForRead(MessageType fileMessageType, MessageType projectedMessageType)
-
init
@Deprecated public ReadSupport.ReadContext init(org.apache.hadoop.conf.Configuration configuration, Map<String,String> keyValueMetaData, MessageType fileSchema)
Deprecated.overrideinit(InitContext)
insteadcalled inInputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext)
in the front end- Parameters:
configuration
- the job configurationkeyValueMetaData
- the app specific metadata from the filefileSchema
- the schema of the file- Returns:
- the readContext that defines how to read the file
-
init
public ReadSupport.ReadContext init(InitContext context)
called inInputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext)
in the front end- Parameters:
context
- the initialisation context- Returns:
- the readContext that defines how to read the file
-
prepareForRead
public abstract RecordMaterializer<T> prepareForRead(org.apache.hadoop.conf.Configuration configuration, Map<String,String> keyValueMetaData, MessageType fileSchema, ReadSupport.ReadContext readContext)
called inRecordReader.initialize(org.apache.hadoop.mapreduce.InputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext)
in the back end the returned RecordMaterializer will materialize the records and add them to the destination- Parameters:
configuration
- the job configurationkeyValueMetaData
- the app specific metadata from the filefileSchema
- the schema of the filereadContext
- returned by the init method- Returns:
- the recordMaterializer that will materialize the records
-
-