Class OrcColumnarRowInputFormat<BatchT,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>

  • All Implemented Interfaces:
    Serializable, org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.table.data.RowData>, org.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,​SplitT>, org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat

    public class OrcColumnarRowInputFormat<BatchT,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
    extends AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,​BatchT,​SplitT>
    implements org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
    An ORC reader that produces a stream of ColumnarRowData records.

    This class can add extra fields through ColumnBatchFactory, for example, add partition fields, which can be extracted from path. Therefore, the getProducedType() may be different and types of extra fields need to be added.

    See Also:
    Serialized Form
    • Constructor Detail

      • OrcColumnarRowInputFormat

        public OrcColumnarRowInputFormat​(OrcShim<BatchT> shim,
                                         org.apache.hadoop.conf.Configuration hadoopConfig,
                                         org.apache.orc.TypeDescription schema,
                                         int[] selectedFields,
                                         List<OrcFilters.Predicate> conjunctPredicates,
                                         int batchSize,
                                         ColumnBatchFactory<BatchT,​SplitT> batchFactory,
                                         org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo)
    • Method Detail

      • getProducedType

        public org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> getProducedType()
        Description copied from class: AbstractOrcFileInputFormat
        Gets the type produced by this format.
        Specified by:
        getProducedType in interface org.apache.flink.connector.file.src.reader.BulkFormat<BatchT,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
        Specified by:
        getProducedType in interface org.apache.flink.api.java.typeutils.ResultTypeQueryable<BatchT>
        Specified by:
        getProducedType in class AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,​BatchT,​SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
      • reportStatistics

        public org.apache.flink.table.plan.stats.TableStats reportStatistics​(List<org.apache.flink.core.fs.Path> files,
                                                                             org.apache.flink.table.types.DataType producedDataType)
        Specified by:
        reportStatistics in interface org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
      • createPartitionedFormat

        public static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> OrcColumnarRowInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,​SplitT> createPartitionedFormat​(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim,
                                                                                                                                                                                                                    org.apache.hadoop.conf.Configuration hadoopConfig,
                                                                                                                                                                                                                    org.apache.flink.table.types.logical.RowType tableType,
                                                                                                                                                                                                                    List<String> partitionKeys,
                                                                                                                                                                                                                    org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor,
                                                                                                                                                                                                                    int[] selectedFields,
                                                                                                                                                                                                                    List<OrcFilters.Predicate> conjunctPredicates,
                                                                                                                                                                                                                    int batchSize,
                                                                                                                                                                                                                    Function<org.apache.flink.table.types.logical.RowType,​org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>> rowTypeInfoFactory)
        Create a partitioned OrcColumnarRowInputFormat, the partition columns can be generated by split.