Package org.apache.flink.formats.csv
Class CsvReaderFormat<T>
- java.lang.Object
-
- org.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>
-
- org.apache.flink.formats.csv.CsvReaderFormat<T>
-
- Type Parameters:
T- The type of the returned elements.
- All Implemented Interfaces:
Serializable,org.apache.flink.api.java.typeutils.ResultTypeQueryable<T>,org.apache.flink.connector.file.src.reader.StreamFormat<T>
@PublicEvolving public class CsvReaderFormat<T> extends org.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>AStreamFormatfor reading CSV files.The following example shows how to create a
CsvReaderFormatwhere the schema for CSV parsing is automatically derived based on the fields of a POJO class.
Note: you might need to addCsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forPojo(SomePojo.class); FileSource<SomePojo> source = FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(filesPath)).build();@JsonPropertyOrder({field1, field2, ...})annotation from theJacksonlibrary to your class definition with the fields order exactly matching those of the CSV file columns).If you need more fine-grained control over the CSV schema or the parsing options, use the more low-level
forSchemastatic factory method based on theJacksonlibrary utilities:Function<CsvMapper, CsvSchema> schemaGenerator = mapper -> mapper.schemaFor(SomePojo.class) .withColumnSeparator('|'); CsvReaderFormat<SomePojo> csvFormat = CsvReaderFormat.forSchema(() -> new CsvMapper(), schemaGenerator, TypeInformation.of(SomePojo.class)); FileSource<SomePojo> source = FileSource.forRecordStreamFormat(csvFormat, Path.fromLocalFile(filesPath)).build();- See Also:
- Serialized Form
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.flink.connector.file.src.reader.StreamFormat.Reader<T>createReader(org.apache.flink.configuration.Configuration config, org.apache.flink.core.fs.FSDataInputStream stream)static <T> CsvReaderFormat<T>forPojo(Class<T> pojoType)Builds a newCsvReaderFormatfor reading CSV files mapped to the provided POJO class definition.static <T> CsvReaderFormat<T>forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema, org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)Builds a newCsvReaderFormatusing aCsvSchema.static <T> CsvReaderFormat<T>forSchema(org.apache.flink.util.function.SerializableSupplier<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper> mapperFactory, org.apache.flink.util.function.SerializableFunction<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper,org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema> schemaGenerator, org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)Builds a newCsvReaderFormatusing aCsvSchemagenerator andCsvMapperfactory.org.apache.flink.api.common.typeinfo.TypeInformation<T>getProducedType()CsvReaderFormat<T>withIgnoreParseErrors()Returns a newCsvReaderFormatconfigured to ignore all parsing errors.
-
-
-
Method Detail
-
forSchema
public static <T> CsvReaderFormat<T> forSchema(org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema schema, org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
Builds a newCsvReaderFormatusing aCsvSchema.- Type Parameters:
T- The type of the returned elements.- Parameters:
schema- The Jackson CSV schema configured for parsing specific CSV files.typeInformation- The Flink type descriptor of the returned elements.
-
forSchema
public static <T> CsvReaderFormat<T> forSchema(org.apache.flink.util.function.SerializableSupplier<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper> mapperFactory, org.apache.flink.util.function.SerializableFunction<org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvMapper,org.apache.flink.shaded.jackson2.com.fasterxml.jackson.dataformat.csv.CsvSchema> schemaGenerator, org.apache.flink.api.common.typeinfo.TypeInformation<T> typeInformation)
Builds a newCsvReaderFormatusing aCsvSchemagenerator andCsvMapperfactory.- Type Parameters:
T- The type of the returned elements.- Parameters:
mapperFactory- The factory creating theCsvMapper.schemaGenerator- A generator that creates and configures the Jackson CSV schema for parsing specific CSV files, from a mapper created by the mapper factory.typeInformation- The Flink type descriptor of the returned elements.
-
forPojo
public static <T> CsvReaderFormat<T> forPojo(Class<T> pojoType)
Builds a newCsvReaderFormatfor reading CSV files mapped to the provided POJO class definition. Produced reader uses default mapper and schema settings, useforSchemaif you need customizations.- Type Parameters:
T- The type of the returned elements.- Parameters:
pojoType- The type class of the POJO.
-
withIgnoreParseErrors
public CsvReaderFormat<T> withIgnoreParseErrors()
Returns a newCsvReaderFormatconfigured to ignore all parsing errors. All the other options directly carried over from the subject of the method call.
-
createReader
public org.apache.flink.connector.file.src.reader.StreamFormat.Reader<T> createReader(org.apache.flink.configuration.Configuration config, org.apache.flink.core.fs.FSDataInputStream stream) throws IOException
- Specified by:
createReaderin classorg.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>- Throws:
IOException
-
getProducedType
public org.apache.flink.api.common.typeinfo.TypeInformation<T> getProducedType()
- Specified by:
getProducedTypein interfaceorg.apache.flink.api.java.typeutils.ResultTypeQueryable<T>- Specified by:
getProducedTypein interfaceorg.apache.flink.connector.file.src.reader.StreamFormat<T>- Specified by:
getProducedTypein classorg.apache.flink.connector.file.src.reader.SimpleStreamFormat<T>
-
-