io.smartdatalake.workflow.dataobject
Optional name of the Excel Sheet to read from/write to.
Optional number of rows in the excel spreadsheet to skip before any data is read. This option must not be set for writing.
Optional first column in the specified Excel Sheet to read from (as string, e.g B). This option must not be set for writing.
Optional last column in the specified Excel Sheet to read from (as string, e.g. F).
Optional limit of the number of rows being returned on read.
This is applied after numLinesToSkip
.
If true
, the first row of the excel sheet specifies the column names (default: true).
Empty cells are parsed as null
values (default: true).
Infer the schema of the excel sheet automatically (default: true).
A format string specifying the format to use when writing timestamps (default: dd-MM-yyyy HH:mm:ss).
A format string specifying the format to use when writing dates.
The number of rows that are stored in memory. If set, a streaming reader is used which can help with big files.
Sample size for schema inference.
A format string specifying the format to use when writing dates.
Optional last column in the specified Excel Sheet to read from (as string, e.g.
Optional last column in the specified Excel Sheet to read from (as string, e.g. F).
Sample size for schema inference.
Infer the schema of the excel sheet automatically (default: true).
The number of rows that are stored in memory.
The number of rows that are stored in memory. If set, a streaming reader is used which can help with big files.
Optional number of rows in the excel spreadsheet to skip before any data is read.
Optional number of rows in the excel spreadsheet to skip before any data is read. This option must not be set for writing.
Optional limit of the number of rows being returned on read.
Optional limit of the number of rows being returned on read.
This is applied after numLinesToSkip
.
Optional name of the Excel Sheet to read from/write to.
Optional first column in the specified Excel Sheet to read from (as string, e.g B).
Optional first column in the specified Excel Sheet to read from (as string, e.g B). This option must not be set for writing.
A format string specifying the format to use when writing timestamps (default: dd-MM-yyyy HH:mm:ss).
Empty cells are parsed as null
values (default: true).
If true
, the first row of the excel sheet specifies the column names (default: true).
Options passed to org.apache.spark.sql.DataFrameReader and org.apache.spark.sql.DataFrameWriter for reading and writing Microsoft Excel files. Excel support is provided by the spark-excel project (see link below).
Optional name of the Excel Sheet to read from/write to.
Optional number of rows in the excel spreadsheet to skip before any data is read. This option must not be set for writing.
Optional first column in the specified Excel Sheet to read from (as string, e.g B). This option must not be set for writing.
Optional last column in the specified Excel Sheet to read from (as string, e.g. F).
Optional limit of the number of rows being returned on read. This is applied after
numLinesToSkip
.If
true
, the first row of the excel sheet specifies the column names (default: true).Empty cells are parsed as
null
values (default: true).Infer the schema of the excel sheet automatically (default: true).
A format string specifying the format to use when writing timestamps (default: dd-MM-yyyy HH:mm:ss).
A format string specifying the format to use when writing dates.
The number of rows that are stored in memory. If set, a streaming reader is used which can help with big files.
Sample size for schema inference.
https://github.com/crealytics/spark-excel