net.tixxit.delimited

DelimitedParser

trait DelimitedParser extends AnyRef

An immutable parser for delimited files. This operates on chunks of input, using the parseChunk method. After parsing a chunk, the parseChunk method returns a new DelimitedParser as well as all of the complete rows parsed in that chunk. Any partially complete rows will be returned in a future call to parseChunk in either the returned DelimitedParser or a future one in a chain of calls to parseChunk.

There are also convenience methods for parsing Files, Strings, InputStreams, Readers, etc.

To get an instance of a DelimitedParser that can be used to parse a CSV, TSV, etc file, you can use something like:

val parser = DelimitedParser(DelimitedFormat.CSV)
val rows: Vector[Either[DelimitedError, Row]] =
  parser.parseFile(new java.io.File("some.csv"))

If you don't know the format of your delimited file ahead of time, not much changes:

val parser = DelimitedParser(DelimitedFormat.Guess)
val rows: Vector[Either[DelimitedError, Row]] =
  parser.parseFile(new java.io.File("some.csv"))
Linear Supertypes
Known Subclasses
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. DelimitedParser
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Abstract Value Members

  1. abstract def format: Option[DelimitedFormat]

    The DelimitedFormat being used to parse this delimited file, or None if a format has not yet been inferred (in which case, no rows have yet been returned by parseChunk).

  2. abstract def parseChunk(chunk: Option[String]): (DelimitedParser, Vector[Either[DelimitedError, Row]])

    Parse a chunk of the input if there is any left.

    Parse a chunk of the input if there is any left. If chunk is None, then that indicates to the parser that there will be no further input. In this case (chunk is None), all remaining input will be consumed and returned as rows (or errors).

    This returns a new DelimitedParser to use to parse the next chunk, as well as a Vector of all complete rows parsed from chunk.

    chunk

    the next chunk of data as a String, or None if eof

  3. abstract def reset: (String, DelimitedParser)

    Returns all unparsed data and a DelimitedParser whose state is completely reset.

Concrete Value Members

  1. final def !=(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  2. final def !=(arg0: Any): Boolean

    Definition Classes
    Any
  3. final def ##(): Int

    Definition Classes
    AnyRef → Any
  4. final def ==(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  5. final def ==(arg0: Any): Boolean

    Definition Classes
    Any
  6. final def asInstanceOf[T0]: T0

    Definition Classes
    Any
  7. def clone(): AnyRef

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. final def eq(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Definition Classes
    AnyRef → Any
  13. final def isInstanceOf[T0]: Boolean

    Definition Classes
    Any
  14. final def ne(arg0: AnyRef): Boolean

    Definition Classes
    AnyRef
  15. final def notify(): Unit

    Definition Classes
    AnyRef
  16. final def notifyAll(): Unit

    Definition Classes
    AnyRef
  17. def parseAll(chunks: Iterator[String]): Iterator[Either[DelimitedError, Row]]

    Parse all chunks in the given iterator, consecutively, treating the last chunk in chunks as the final input.

    Parse all chunks in the given iterator, consecutively, treating the last chunk in chunks as the final input. This will return all rows from the input.

  18. def parseFile(file: File, charset: Charset = StandardCharsets.UTF_8): Vector[Either[DelimitedError, Row]]

    Completely parses file and returns all the rows in a Vector.

    Completely parses file and returns all the rows in a Vector.

    file

    the TSV file on disk

    charset

    the character set the TSV was encoded in

  19. def parseInputStream(is: InputStream, charset: Charset = StandardCharsets.UTF_8): Iterator[Either[DelimitedError, Row]]

    Returns an iterator that parses rows from in as elements are consumed.

    Returns an iterator that parses rows from in as elements are consumed.

    charset

    the character set to decode the bytes as

  20. def parseReader(reader: Reader): Iterator[Either[DelimitedError, Row]]

    Returns an iterator that parses rows from reader as elements are consumed.

  21. def parseString(input: String): Vector[Either[DelimitedError, Row]]

    Parses an entire delimited file as a string.

  22. final def synchronized[T0](arg0: ⇒ T0): T0

    Definition Classes
    AnyRef
  23. def toString(): String

    Definition Classes
    AnyRef → Any
  24. final def wait(): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  25. final def wait(arg0: Long, arg1: Int): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  26. final def wait(arg0: Long): Unit

    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped