Package org.apache.flink.orc
Class OrcColumnarRowInputFormat<BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
- java.lang.Object
-
- org.apache.flink.orc.AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT>
-
- org.apache.flink.orc.OrcColumnarRowInputFormat<BatchT,SplitT>
-
- All Implemented Interfaces:
Serializable,org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.table.data.RowData>,org.apache.flink.connector.file.src.reader.BulkFormat<org.apache.flink.table.data.RowData,SplitT>,org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
public class OrcColumnarRowInputFormat<BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> extends AbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT> implements org.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
An ORC reader that produces a stream ofColumnarRowDatarecords.This class can add extra fields through
ColumnBatchFactory, for example, add partition fields, which can be extracted from path. Therefore, thegetProducedType()may be different and types of extra fields need to be added.- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
AbstractOrcFileInputFormat.OrcReaderBatch<T,BatchT>, AbstractOrcFileInputFormat.OrcVectorizedReader<T,BatchT>
-
-
Field Summary
-
Fields inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
batchSize, conjunctPredicates, hadoopConfigWrapper, schema, selectedFields, shim
-
-
Constructor Summary
Constructors Constructor Description OrcColumnarRowInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, ColumnBatchFactory<BatchT,SplitT> batchFactory, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
OrcColumnarRowInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,SplitT>createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType tableType, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, Function<org.apache.flink.table.types.logical.RowType,org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>> rowTypeInfoFactory)Create a partitionedOrcColumnarRowInputFormat, the partition columns can be generated by split.AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT>createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT>> recycler, int batchSize)Creates theAbstractOrcFileInputFormat.OrcReaderBatchstructure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>getProducedType()Gets the type produced by this format.org.apache.flink.table.plan.stats.TableStatsreportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType)-
Methods inherited from class org.apache.flink.orc.AbstractOrcFileInputFormat
createReader, isSplittable, restoreReader
-
-
-
-
Constructor Detail
-
OrcColumnarRowInputFormat
public OrcColumnarRowInputFormat(OrcShim<BatchT> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.orc.TypeDescription schema, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, ColumnBatchFactory<BatchT,SplitT> batchFactory, org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> producedTypeInfo)
-
-
Method Detail
-
createReaderBatch
public AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT> createReaderBatch(SplitT split, OrcVectorizedBatchWrapper<BatchT> orcBatch, org.apache.flink.connector.file.src.util.Pool.Recycler<AbstractOrcFileInputFormat.OrcReaderBatch<org.apache.flink.table.data.RowData,BatchT>> recycler, int batchSize)
Description copied from class:AbstractOrcFileInputFormatCreates theAbstractOrcFileInputFormat.OrcReaderBatchstructure, which is responsible for holding the data structures that hold the batch data (column vectors, row arrays, ...) and the batch conversion from the ORC representation to the result format.- Specified by:
createReaderBatchin classAbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
-
getProducedType
public org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData> getProducedType()
Description copied from class:AbstractOrcFileInputFormatGets the type produced by this format.- Specified by:
getProducedTypein interfaceorg.apache.flink.connector.file.src.reader.BulkFormat<BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>- Specified by:
getProducedTypein interfaceorg.apache.flink.api.java.typeutils.ResultTypeQueryable<BatchT>- Specified by:
getProducedTypein classAbstractOrcFileInputFormat<org.apache.flink.table.data.RowData,BatchT,SplitT extends org.apache.flink.connector.file.src.FileSourceSplit>
-
reportStatistics
public org.apache.flink.table.plan.stats.TableStats reportStatistics(List<org.apache.flink.core.fs.Path> files, org.apache.flink.table.types.DataType producedDataType)
- Specified by:
reportStatisticsin interfaceorg.apache.flink.table.connector.format.FileBasedStatisticsReportableInputFormat
-
createPartitionedFormat
public static <SplitT extends org.apache.flink.connector.file.src.FileSourceSplit> OrcColumnarRowInputFormat<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch,SplitT> createPartitionedFormat(OrcShim<org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch> shim, org.apache.hadoop.conf.Configuration hadoopConfig, org.apache.flink.table.types.logical.RowType tableType, List<String> partitionKeys, org.apache.flink.connector.file.table.PartitionFieldExtractor<SplitT> extractor, int[] selectedFields, List<OrcFilters.Predicate> conjunctPredicates, int batchSize, Function<org.apache.flink.table.types.logical.RowType,org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.table.data.RowData>> rowTypeInfoFactory)
Create a partitionedOrcColumnarRowInputFormat, the partition columns can be generated by split.
-
-