Schema declares fields which are expected in CSV stream - their names and types. Fields with optional values have to be defined as Options:
val schema = CSVSchema()
.add[String]("name")
.add[Option[LocalDate]]("birthday")
Optional fields still have to exist in source data, only their values may be empty. Not all fields have to be declared by schema, any subset of them is sufficient.
In case of header mapping through CSVConfig.mapHeader, the names provided in schema are the final ones, after mapping.
Additionally, it is possible to specify per-field validators, posing additional requirements on CSV data:
val schema = CSVSchema()
.add[String]("code", RegexValidator("[A-Z][A-Z0-9]+"))
.add[BigDecimal]("price", MinValidator(0.01))
For more information on available, built-in validators or how to create additional ones see Validator.
CSV schema is verified through its validate method. It may yield an InvalidRecord, containing validation error together with original Record data or a TypedRecord, containing selected, strongly typed data - in both cases wrapped in cats.data.Validated.
Type parameters
T
tuple encoding the schema
Value parameters
columns
the typle containing typed columns with optional validators
Field definition consists of field name and its type. A set of field definitions constitutes a schema definition. A collection of additional Validators may be added to a field. When validating schema, validators are checked after field type verification and receive already parsed value of type declared for a field.
To get value of proper type from a field, an implicit StringParser is required. Parsers for basic types and formats are available through StringParser object. Additional ones may be provided by implementing the StringParser trait.
Optional values should be denoted by providing Option[A] as field type value. Note, that even optionals require the field to be present in the source data, only its values may be missing (empty).
The same validators, which are used to validate plain values, may be used to verify optional values. Missing value (None) is assumed correct in such a case.
This is a chaining method which allows starting with an empty schema and extending it through subsequent calls to add:
val schema = CSVSchema()
.add[Double]("latitude", RangeValidator(-90.0, 90.0))
.add[Double]("longitude", RangeValidator(-180.0, 180.0))
Type parameters
V
field value type
Value parameters
ev
evidence that the key is unique - it is not present in the schema yet
key
unique field name - a singleton string
validators
optional validators to check that field values comply with additional rules
Attributes
Returns
new schema definition with column (field definition) added to it
parses all fields defined by schema to the declared type and, if successful,
runs provided validators with parsed values. The process is successful and creates a TypedRecord if values of all fields defined in schema are correctly parsed and positively validated. If any of these operations fails, an InvalidRecord is yielded.
If there are many validators defined for single field, the validation stops at first invalid result. Validation is nonetheless executed for all fields and collects errors from all of them.
CSV values which are not declared in schema are omitted. At the extremes, empty schema always proves valid, although yields empty typed records.
Type parameters
F
the effect type, with a type class providing support for logging (provided internally by spata)
Value parameters
enforcer
given value to recursively do the validation, provided by spata