Class ReadSupport<T>

    • Field Detail

      • PARQUET_READ_SCHEMA

        public static final String PARQUET_READ_SCHEMA
        configuration key for a parquet read projection schema
        See Also:
        Constant Field Values
    • Constructor Detail

      • ReadSupport

        public ReadSupport()
    • Method Detail

      • getSchemaForRead

        public static org.apache.parquet.schema.MessageType getSchemaForRead​(org.apache.parquet.schema.MessageType fileMessageType,
                                                                             String partialReadSchemaString)
        attempts to validate and construct a MessageType from a read projection schema
        Parameters:
        fileMessageType - the typed schema of the source
        partialReadSchemaString - the requested projection schema
        Returns:
        the typed schema that should be used to read
      • getSchemaForRead

        public static org.apache.parquet.schema.MessageType getSchemaForRead​(org.apache.parquet.schema.MessageType fileMessageType,
                                                                             org.apache.parquet.schema.MessageType projectedMessageType)
      • init

        @Deprecated
        public ReadSupport.ReadContext init​(org.apache.hadoop.conf.Configuration configuration,
                                            Map<String,​String> keyValueMetaData,
                                            org.apache.parquet.schema.MessageType fileSchema)
        Deprecated.
        override init(InitContext) instead
        called in InputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext) in the front end
        Parameters:
        configuration - the job configuration
        keyValueMetaData - the app specific metadata from the file
        fileSchema - the schema of the file
        Returns:
        the readContext that defines how to read the file
      • init

        public ReadSupport.ReadContext init​(InitContext context)
        called in InputFormat.getSplits(org.apache.hadoop.mapreduce.JobContext) in the front end
        Parameters:
        context - the initialisation context
        Returns:
        the readContext that defines how to read the file
      • prepareForRead

        public abstract org.apache.parquet.io.api.RecordMaterializer<T> prepareForRead​(org.apache.hadoop.conf.Configuration configuration,
                                                                                       Map<String,​String> keyValueMetaData,
                                                                                       org.apache.parquet.schema.MessageType fileSchema,
                                                                                       ReadSupport.ReadContext readContext)
        called in RecordReader.initialize(org.apache.hadoop.mapreduce.InputSplit, org.apache.hadoop.mapreduce.TaskAttemptContext) in the back end the returned RecordMaterializer will materialize the records and add them to the destination
        Parameters:
        configuration - the job configuration
        keyValueMetaData - the app specific metadata from the file
        fileSchema - the schema of the file
        readContext - returned by the init method
        Returns:
        the recordMaterializer that will materialize the records