Record format
If true the input data file encoding is EBCDIC, otherwise it is ASCII
If true line ending characters will be used (LF / CRLF) as the record separator
Specifies what code page to use for EBCDIC to ASCII/Unicode conversions
An optional custom code page conversion class provided by a user
A charset for ASCII data
Specifies a mapping between a field name and the code page
If true UTF-16 strings are considered big-endian.
A format of floating-point numbers
If true, OCCURS DEPENDING ON data size will depend on the number of elements
Specifies the length of the record disregarding the copybook record size. Implied the file has fixed record length.
Minium record length for which the record is considered valid.
Maximum record length for which the record is considered valid.
A name of a field that contains record length. Optional. If not set the copybook record length will be used.
Does input files have 4 byte record length headers
Block descriptor word (if specified), for FB and VB record formats
Does RDW count itself as part of record length itself
Controls a mismatch between RDW and record length
Is indexing input file before processing is requested
The number of records to include in each partition. Notice mainframe records may have variable size, inputSplitMB is the recommended option
A partition size to target. In certain circumstances this size may not be exactly that, but the library will do the best effort to target that size
Default HDFS block size for the HDFS filesystem used. This value is used as the default split size if inputSplitSizeMB is not specified
An offset to the start of the record in each binary data block.
An offset from the end of the record to the end of the binary data block.
A number of bytes to skip at the beginning of each file
A number of bytes to skip at the end of each file
If true, a record id field will be prepended to each record.
Generate 'record_bytes' field containing raw bytes of the original record
Specifies a policy to transform the input schema. The default policy is to keep the schema exactly as it is in the copybook.
Specifies if and how strings should be trimmed when parsed.
If true, partial ASCII records can be parsed (in cases when LF character is missing for example)
Parameters specific to reading multisegment files
A comment truncation policy
If true, string values that contain only zero bytes (0x0) will be considered null.
Decode binary fields as HEX strings
If true the parser will drop all FILLER fields, even GROUP FILLERS that have non-FILLER nested fields
If true the parser will drop all value FILLER fields
Specifies the strategy of renaming FILLER names to make them unique
A list of non-terminals (GROUPS) to combine and parse as primitive fields
Specifies if debugging fields need to be added and what should they contain (false, hex, raw).
A parser used to parse data field record headers
An optional additional option string passed to a custom record header parser
A column name to add to the dataframe. The column will contain input file name for each record similar to 'input_file_name()' function
Specifies the policy of metadat fields to be added to the Spark schema
If true, partial ASCII records can be parsed (in cases when LF character is missing for example)
A charset for ASCII data
Block descriptor word (if specified), for FB and VB record formats
A comment truncation policy
Specifies if debugging fields need to be added and what should they contain (false, hex, raw).
Decode binary fields as HEX strings
If true the parser will drop all FILLER fields, even GROUP FILLERS that have non-FILLER nested fields
If true the parser will drop all value FILLER fields
Specifies what code page to use for EBCDIC to ASCII/Unicode conversions
An optional custom code page conversion class provided by a user
An offset from the end of the record to the end of the binary data block.
Specifies a mapping between a field name and the code page
A number of bytes to skip at the end of each file
A number of bytes to skip at the beginning of each file
Specifies the strategy of renaming FILLER names to make them unique
A format of floating-point numbers
Generate 'record_bytes' field containing raw bytes of the original record
If true, a record id field will be prepended to each record.
Default HDFS block size for the HDFS filesystem used.
Default HDFS block size for the HDFS filesystem used. This value is used as the default split size if inputSplitSizeMB is not specified
If true, string values that contain only zero bytes (0x0) will be considered null.
A column name to add to the dataframe.
A column name to add to the dataframe. The column will contain input file name for each record similar to 'input_file_name()' function
The number of records to include in each partition.
The number of records to include in each partition. Notice mainframe records may have variable size, inputSplitMB is the recommended option
A partition size to target.
A partition size to target. In certain circumstances this size may not be exactly that, but the library will do the best effort to target that size
If true the input data file encoding is EBCDIC, otherwise it is ASCII
Is indexing input file before processing is requested
Does RDW count itself as part of record length itself
Does input files have 4 byte record length headers
If true line ending characters will be used (LF / CRLF) as the record separator
If true UTF-16 strings are considered big-endian.
A name of a field that contains record length.
A name of a field that contains record length. Optional. If not set the copybook record length will be used.
Maximum record length for which the record is considered valid.
Specifies the policy of metadat fields to be added to the Spark schema
Minium record length for which the record is considered valid.
Parameters specific to reading multisegment files
A list of non-terminals (GROUPS) to combine and parse as primitive fields
Controls a mismatch between RDW and record length
Record format
A parser used to parse data field record headers
Specifies the length of the record disregarding the copybook record size.
Specifies the length of the record disregarding the copybook record size. Implied the file has fixed record length.
An optional additional option string passed to a custom record header parser
Specifies a policy to transform the input schema.
Specifies a policy to transform the input schema. The default policy is to keep the schema exactly as it is in the copybook.
An offset to the start of the record in each binary data block.
Specifies if and how strings should be trimmed when parsed.
If true, OCCURS DEPENDING ON data size will depend on the number of elements
These are properties for customizing mainframe binary data reader.
Record format
If true the input data file encoding is EBCDIC, otherwise it is ASCII
If true line ending characters will be used (LF / CRLF) as the record separator
Specifies what code page to use for EBCDIC to ASCII/Unicode conversions
An optional custom code page conversion class provided by a user
A charset for ASCII data
Specifies a mapping between a field name and the code page
If true UTF-16 strings are considered big-endian.
A format of floating-point numbers
If true, OCCURS DEPENDING ON data size will depend on the number of elements
Specifies the length of the record disregarding the copybook record size. Implied the file has fixed record length.
Minium record length for which the record is considered valid.
Maximum record length for which the record is considered valid.
A name of a field that contains record length. Optional. If not set the copybook record length will be used.
Does input files have 4 byte record length headers
Block descriptor word (if specified), for FB and VB record formats
Does RDW count itself as part of record length itself
Controls a mismatch between RDW and record length
Is indexing input file before processing is requested
The number of records to include in each partition. Notice mainframe records may have variable size, inputSplitMB is the recommended option
A partition size to target. In certain circumstances this size may not be exactly that, but the library will do the best effort to target that size
Default HDFS block size for the HDFS filesystem used. This value is used as the default split size if inputSplitSizeMB is not specified
An offset to the start of the record in each binary data block.
An offset from the end of the record to the end of the binary data block.
A number of bytes to skip at the beginning of each file
A number of bytes to skip at the end of each file
If true, a record id field will be prepended to each record.
Generate 'record_bytes' field containing raw bytes of the original record
Specifies a policy to transform the input schema. The default policy is to keep the schema exactly as it is in the copybook.
Specifies if and how strings should be trimmed when parsed.
If true, partial ASCII records can be parsed (in cases when LF character is missing for example)
Parameters specific to reading multisegment files
A comment truncation policy
If true, string values that contain only zero bytes (0x0) will be considered null.
Decode binary fields as HEX strings
If true the parser will drop all FILLER fields, even GROUP FILLERS that have non-FILLER nested fields
If true the parser will drop all value FILLER fields
Specifies the strategy of renaming FILLER names to make them unique
A list of non-terminals (GROUPS) to combine and parse as primitive fields
Specifies if debugging fields need to be added and what should they contain (false, hex, raw).
A parser used to parse data field record headers
An optional additional option string passed to a custom record header parser
A column name to add to the dataframe. The column will contain input file name for each record similar to 'input_file_name()' function
Specifies the policy of metadat fields to be added to the Spark schema