public class OrcRowInputFormat
extends org.apache.flink.api.common.io.FileInputFormat<org.apache.flink.types.Row>
implements org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.types.Row>
Modifier and Type | Class and Description |
---|---|
static class |
OrcRowInputFormat.Between
An BETWEEN predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.Equals
An EQUALS predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.In
An IN predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.IsNull
An IS_NULL predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.LessThan
A LESS_THAN predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.LessThanEquals
A LESS_THAN_EQUALS predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.Not
A NOT predicate to negate a predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.NullSafeEquals
An EQUALS predicate that can be evaluated with Null safety by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.Or
An OR predicate that can be evaluated by the OrcRowInputFormat.
|
static class |
OrcRowInputFormat.Predicate
A filter predicate that can be evaluated by the OrcRowInputFormat.
|
Constructor and Description |
---|
OrcRowInputFormat(String path,
String schemaString,
org.apache.hadoop.conf.Configuration orcConfig)
Creates an OrcRowInputFormat.
|
OrcRowInputFormat(String path,
String schemaString,
org.apache.hadoop.conf.Configuration orcConfig,
int batchSize)
Creates an OrcRowInputFormat.
|
OrcRowInputFormat(String path,
org.apache.orc.TypeDescription orcSchema,
org.apache.hadoop.conf.Configuration orcConfig,
int batchSize)
Creates an OrcRowInputFormat.
|
Modifier and Type | Method and Description |
---|---|
void |
addPredicate(OrcRowInputFormat.Predicate predicate)
Adds a filter predicate to reduce the number of rows to be returned by the input format.
|
void |
close() |
void |
closeInputFormat() |
org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.types.Row> |
getProducedType() |
org.apache.flink.types.Row |
nextRecord(org.apache.flink.types.Row reuse) |
void |
open(org.apache.flink.core.fs.FileInputSplit fileSplit) |
void |
openInputFormat() |
boolean |
reachedEnd() |
void |
selectFields(int... selectedFields)
Selects the fields from the ORC schema that are returned by InputFormat.
|
boolean |
supportsMultiPaths() |
acceptFile, configure, createInputSplits, decorateInputStream, extractFileExtension, getFilePath, getFilePaths, getFileStats, getFileStats, getFileStatus, getInflaterInputStreamFactory, getInputSplitAssigner, getMinSplitSize, getNestedFileEnumeration, getNumSplits, getOpenTimeout, getSplitLength, getSplitProviderThreadPoolSize, getSplitStart, getStatistics, registerInflaterInputStreamFactory, setFilePath, setFilePath, setFilePaths, setFilePaths, setFilesFilter, setMinSplitSize, setNestedFileEnumeration, setNumSplits, setOpenTimeout, setSplitProviderThreadPoolSize, testForUnsplittable, toString
public OrcRowInputFormat(String path, String schemaString, org.apache.hadoop.conf.Configuration orcConfig)
path
- The path to read ORC files from.schemaString
- The schema of the ORC files as String.orcConfig
- The configuration to read the ORC files with.public OrcRowInputFormat(String path, String schemaString, org.apache.hadoop.conf.Configuration orcConfig, int batchSize)
path
- The path to read ORC files from.schemaString
- The schema of the ORC files as String.orcConfig
- The configuration to read the ORC files with.batchSize
- The number of Row objects to read in a batch.public OrcRowInputFormat(String path, org.apache.orc.TypeDescription orcSchema, org.apache.hadoop.conf.Configuration orcConfig, int batchSize)
path
- The path to read ORC files from.orcSchema
- The schema of the ORC files as ORC TypeDescription.orcConfig
- The configuration to read the ORC files with.batchSize
- The number of Row objects to read in a batch.public void addPredicate(OrcRowInputFormat.Predicate predicate)
Note: Predicates can significantly reduce the amount of data that is read. However, the OrcRowInputFormat does not guarantee that all returned rows qualify the predicates. Moreover, predicates are only applied if the referenced field is among the selected fields.
predicate
- The filter predicate.public void selectFields(int... selectedFields)
selectedFields
- The indices of the fields of the ORC schema that are returned by the InputFormat.public void openInputFormat() throws IOException
openInputFormat
in class org.apache.flink.api.common.io.RichInputFormat<org.apache.flink.types.Row,org.apache.flink.core.fs.FileInputSplit>
IOException
public void open(org.apache.flink.core.fs.FileInputSplit fileSplit) throws IOException
open
in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,org.apache.flink.core.fs.FileInputSplit>
open
in class org.apache.flink.api.common.io.FileInputFormat<org.apache.flink.types.Row>
IOException
public void close() throws IOException
close
in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,org.apache.flink.core.fs.FileInputSplit>
close
in class org.apache.flink.api.common.io.FileInputFormat<org.apache.flink.types.Row>
IOException
public void closeInputFormat() throws IOException
closeInputFormat
in class org.apache.flink.api.common.io.RichInputFormat<org.apache.flink.types.Row,org.apache.flink.core.fs.FileInputSplit>
IOException
public boolean reachedEnd() throws IOException
reachedEnd
in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,org.apache.flink.core.fs.FileInputSplit>
IOException
public org.apache.flink.types.Row nextRecord(org.apache.flink.types.Row reuse) throws IOException
nextRecord
in interface org.apache.flink.api.common.io.InputFormat<org.apache.flink.types.Row,org.apache.flink.core.fs.FileInputSplit>
IOException
public org.apache.flink.api.common.typeinfo.TypeInformation<org.apache.flink.types.Row> getProducedType()
getProducedType
in interface org.apache.flink.api.java.typeutils.ResultTypeQueryable<org.apache.flink.types.Row>
public boolean supportsMultiPaths()
supportsMultiPaths
in class org.apache.flink.api.common.io.FileInputFormat<org.apache.flink.types.Row>
Copyright © 2014–2019 The Apache Software Foundation. All rights reserved.