spata

package spata

Spata primary package.

Linear Supertypes

AnyRef, Any

Ordering

Alphabetic
By Inheritance

Inherited

spata
AnyRef
Any

Hide All
Show All

Visibility

Public
Protected

Package Members

package converter
package error
package io
package schema
Schema validation package.
package text
Text parsing package.
package util
Package with utility classes

Type Members

case class CSVConfig extends Product with Serializable
CSV configuration used for creating CSVParser.
CSV configuration used for creating CSVParser.
This config may be used as a builder to create a parser:
```
val parser = CSVConfig.fieldSizeLimit(1000).noHeader().get[IO]()
```
Field delimiter is ',' by default.
Record delimiter is '\n' by default. When the delimiter is line feed ('\n', ASCII 10) and it is preceded by carriage return ('\r', ASCII 13), they are treated as a single character.
Quotation mark is '"' by default. It is required to wrap special characters in quotes - field and record delimiters. Quotation mark in content may appear only inside quotation marks. It has to be doubled to be interpreted as part of actual data, not a control character. If a field starts or end with white character it has to be wrapped in quotation marks. In another case the white characters are stripped.
If the source has a header, which is the default, it is used as an keys to actual values and not included in data. If there is no header, a number-based keys are created (starting from "_1").
If CSV records are converted to case classes, header values are used as class fields and may require remapping. This can be achieved through mapHeader:
```
config.mapHeader(Map("first name" -> "firstName", "last name" -> "lastName")))
```
or if there is no header line:
```
config.mapHeader(Map("_1" -> "firstName", "_2" -> "lastName")))
```
Header mapping may be also position-based, which is especially handy when there are duplicates in header and name-based remapping does not solve it (because it remaps all occurrences):
```
config.mapHeader(Map(0 -> "firstName", 1 -> "lastName")))
```
Field size limit is used to stop processing input when it is significantly larger then expected and avoid OutOfMemoryError. This might happen if the source structure is invalid, e.g. the closing quotation mark is missing. There is no limit by default.
class CSVParser[F[_]] extends AnyRef
A utility for parsing comma-separated values (CSV) sources.
A utility for parsing comma-separated values (CSV) sources. The source is assumed to be RFC 4180 conform, although some aspects of its format are configurable.
The parser may be created by providing full configuration with CSVConfig or through a helper CSVParser.config function from companion object, e.g.:
```
val parser = CSVParser.config.fieldDelimiter(';').get[IO]()
```
Actual parsing is done through one of the 3 groups of methods:
- parse to transform a stream of characters into records and process data in a functional way, which is the recommended approach,
- get to fetch whole source data at once into a list,
- process to deal with individual records through a callback function.
This parser is normally used with stream fetching data from some external source, so its computations are wrapped for deferred evaluation into an effect F, e.g. cats.effect.IO. Basic parsing does not impose any special requirements on F, except its support for suspended execution, which requires implicit instance of cats.effect.Sync.
To trigger evaluation, one of the unsafe operations on F has to be called. Their exact form depends on actual effect in use (e.g. cats.effect.IO.unsafeRunSync).
No method in this class does context (thread) shift and by default they execute synchronously on current thread. Concurrency or asynchronous execution may be introduced through various fs2.Stream methods. There is also supporting class CSVParser#Async available, which provides method for asynchronous callbacks.
F
the effect type, with a type class providing support for suspended execution (typically cats.effect.IO) and logging (provided internally by spata)
type Decoded[A] = Either[ContentError, A]
Convenience type representing result of retrieving record data.
sealed trait HeaderMap extends AnyRef
Trait representing header remapping methods.
Trait representing header remapping methods. It is not used directly but through conversion of [S2S] or [I2S] partial function to one its implementation classes.
See also
[CSVConfig] for sample usage.
type I2S = PartialFunction[Int, String]
Convenience type.
class Record extends AnyRef
CSV record representation.
CSV record representation. A record is basically a map from string to string. Values are indexed by header row if present or by tuple-style header: "_1", "_2" etc.
lineNum is the last line in source file which content is part of this record - in other words it is the number of lines consumed so far to load this record. It starts with 1, including header line - first data record has typically line number 2. There may be many lines per record when some fields contain line breaks. New line is interpreted independently from CSV record separator, as the standard platform EOL character sequence.
rowNum is record counter. It start with 1 for data, with header row having number 0. It differs from lineNum for sources with header or fields containing line breaks.
type S2S = PartialFunction[String, String]
Convenience type.

Value Members

object CSVParser
CSVParser companion object with types definitions and convenience methods to create parsers.
object HeaderMap
Implicit conversions for [HeaderMap] trait.
object Record
Record helper object.
Record helper object. Used to create and converter records.

Packages

spata

package spata

Package Members

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

Packages

spata

package spata

Package Members

Type Members

Value Members

Inherited from AnyRef

Inherited from Any

Ungrouped

spata