package spac
SPaC (short for "Streaming Parser Combinators") is a library for building stream consumers in a declarative style, specialized for tree-like data types like XML and JSON.
Many utilities for handling XML and JSON data involve parsing the entire "document" to some DOM model, then inspecting and transforming that model to extract information. The downside to these utilities is that when the document is very large, the DOM may not fit in memory. The workaround for this type of problem is to treat the document as a stream of "events", e.g. "StartElement" and "EndElement" for XML, or "StartObject" and "EndObject" for JSON. The downside to this workaround is that writing code to handle these streams can be complicated and error-prone, especially when the DOM is complicated.
SPaC's goal is to drastically simplify the process of creating code to handle these streams.
This package contains the "core" SPaC traits; Parser
, Transformer
, Splitter
, and ContextMatcher
.
See the xml
and json
subpackages (provided by the xml-spac
and json-spac
libraries respectively)
for specific utilities related to handling XML and JSON event streams.
- Source
- package.scala
- Grouped
- Alphabetic
- By Inheritance
- spac
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Type Members
- case class CallerPos(filename: String, line: Int) extends Product with Serializable
Represents a location in code that called a method.
Represents a location in code that called a method. An implicit instance of this class will be automatically derived by a macro on-demand. CallerPos's ultimate purpose is to be present in certain
SpacTraceElement
classes, helping to point to specific splitters orparse
calls in the event of a parsing error. - final case class ChunkSize(i: Int) extends AnyVal with Product with Serializable
Used implicitly when creating certain
Parsable
instances to determine the "chunkSize" argument forStream.fromBlockingIterator
.Used implicitly when creating certain
Parsable
instances to determine the "chunkSize" argument forStream.fromBlockingIterator
.No implicit ChunkSize is available by default, but the implicit derivations that expect one will default to
ChunkSize.default
, which uses a chunk size of 32.You can define a local
implicit val chunkSize = ChunkSize(n)
to override this default.- i
The chunk size as an integer
- sealed trait ContextChange[+In, +C] extends AnyRef
Represents either entering (
ContextPush
) or exiting (ContextPop
) some matched context within a stream of inputs.Represents either entering (
ContextPush
) or exiting (ContextPop
) some matched context within a stream of inputs.ContextChanges will generally be used to designate "sub-stream" boundaries, i.e. a selection of xml elements from within a stream, but may be used more generally to attach a stack-like state to stream transformers.
- In
The value type of the elements in the stream being inspected
- C
The type of the matched context
- trait ContextLocation extends AnyRef
A map-like representation of some location in a stream, used like stack trace elements for reporting errors in stream processing.
- trait ContextMatcher[Elem, +A] extends AnyRef
An object responsible for inspecting a stack of
StartElement
events and determining if they correspond to some "context" value of typeA
.An object responsible for inspecting a stack of
StartElement
events and determining if they correspond to some "context" value of typeA
.ContextMatcher
s play a primary role in splitting an XML event stream into "substreams", i.e. each substream is defined as the series of consecutive events during which the XML tag stack matches a context.ContextMatcher
s are intended to be transformed and combined with each other in order to build up more complex matching functionality. See also:SingleElementContextMatcher
, which contains additional combination methods and some specialized transformation methods.- A
The type of the matched context.
- case class ContextPush[+In, +C](location: ContextTrace[In], context: C) extends ContextChange[In, C] with Product with Serializable
- case class ContextTrace[+A](elems: Chain[(ContextLocation, A)]) extends Product with Serializable
- trait HasLocation extends AnyRef
Marker trait used by
SpacTraceElement.InInput
to extract location information from inputs that cause parsing exceptions. - trait LowPriorityTypeReduceImplicits extends AnyRef
- trait Parsable[F[_], -S, +In] extends AnyRef
Typeclass used to provide functionality to
Parser#parse
andParser#parseF
.Typeclass used to provide functionality to
Parser#parse
andParser#parseF
.Implementations of
Parsable
are responsible for calling a parser'snewHandler
method, then feeding events to that handler until either the handler finishes on its own, or there are no more events.Extra instances of
Parsable
are made available via separate spac "support" libraries, e.g. viaJavaxSupport
to allow Files etc to be parsed as XmlEvents.- F
The "effect" type, or
cats.Id
to indicate theparse
operation is "blocking". Note that the use ofcats.Id
in this context doesn't mean it is not allowed to evaluate side-effects; it's just a way of escaping from having everything wrapped inF
.- S
The "source" type. Typically an
Iterable[In]
, or ajava.io.File
- In
The "event" type, i.e. the parser's input type
- trait Parser[-In, +Out] extends AnyRef
Primary "spac" abstraction which represents a sink for data events.
Primary "spac" abstraction which represents a sink for data events.
Parsers are responsible for interpreting a stream of
In
events as a single result of typeOut
. The actual interpretation is performed by aParser.Handler
which the Parser is responsible for constructing. Handlers may be internally-mutable, and so they are generally only constructed by theparse
helper methods or by other handlers. Parsers themselves are immutable, acting as "handler factories", and so they may be freely reused.A parser differs from typical "fold" operations in that it may choose to abort early with a result, leaving the remainder of the data stream untouched.
- In
event/input type
- Out
result type
- class ParserApplyWithBoundInput[In] extends AnyRef
Convenience version of the
Parser
companion object, which provides parser constructors with theIn
type already specified.Convenience version of the
Parser
companion object, which provides parser constructors with theIn
type already specified. Integrations for XML and JSON will generally create implicit classes to add methods to this class forIn = XmlEvent
andIn = JsonEvent
respectively. - sealed trait Signal extends AnyRef
Value used by
Transformer.Handler
to indicate to its upstream producer whether or not the handler wants to continue receiving values. - trait SingleItemContextMatcher[Item, +A] extends ContextMatcher[Item, A]
Specialization of ContextMatcher which only checks the first element in the stack for matching operations.
Specialization of ContextMatcher which only checks the first element in the stack for matching operations. Transformation operations on single-element matchers will yield other single-element matchers (rather than the base ContextMatcher type). Combination operations involving other single-element matchers will also yield single-element matchers. SingleElementContextMatchers form the building blocks of more complex matchers.
- A
The type of the matched context.
- abstract class SpacException[Self <: SpacException[Self]] extends Exception with NoStackTrace
Base class for all exceptions thrown by Spac parsers.
Base class for all exceptions thrown by Spac parsers. A
SpacException
holds aspacTrace
, which is similar to a *stack* trace, but uses a specialized element type to hold helpful debug information about the cause and context of the exception, and the input that caused it.SpacException uses
NoStackTrace
to suppress the usual stack trace, since exceptions thrown by a Parser will not have useful stack trace information for end users of the Spac framework.- Self
self-type used in the type signature of
withSpacTrace
- trait SpacTraceElement extends AnyRef
A play on words vs StackTraceElement, a *Spac* trace element represents some contextual location inside the logic of a spac Parser, or the location of an input to that parser.
A play on words vs StackTraceElement, a *Spac* trace element represents some contextual location inside the logic of a spac Parser, or the location of an input to that parser.
SpacTraceElement
s are used bySpacException
to provide useful debugging information for when a Parser fails. - trait Splitter[In, +C] extends AnyRef
Primary "spac" abstraction that acts as a selector for sub-streams within a single input stream.
Primary "spac" abstraction that acts as a selector for sub-streams within a single input stream.
A "sub-stream" is some series of consecutive values from the original stream, identified by a "context" value. Sub-streams do not overlap with each other.
For example, when handling a stream of XML events, you might want to create a Splitter that identifies the events representing elements at a specific location within the XML; something like an XPATH that operates on streams. When using
xml-spac
, you might construct a splitter likeSplitter.xml("rootElem" \ "things" \ "thing")
. This would identify a new sub-stream for each<thing>
element that appears inside a<things>
element, inside the<rootElem>
element. An example sub-stream for a<thing>
element might beElemStart("thing"), Text("hello"), ElemEnd("thing")
.A Splitter's general goal is to attach a Parser or Transformer to each sub-stream, passing the contents of that sub-stream through the attached Parser or Transformer in order to get an interpretation of that sub-stream (i.e. the Parser's result, or some emitted outputs from a Transformer). With the
<thing>
example above, you might attach a parser that concatenates the context all Text events it sees. I.e.XmlParser.forText
. Since a separate parser handler will run for each sub-stream, this becomes something like "A stream of Strings which each represent the concatenated text from an individual<thing>
element".- In
Data event type for the input stream
- C
Context type used to identify each sub-stream
- class SplitterApplyWithBoundInput[In] extends AnyRef
- sealed trait StackInterpretation[+In, +Elem] extends AnyRef
Outcome of a
StackLike[In, Elem]
, indicating whether a given input was a stack push/pop, and whether that push/pop should be treated as happening before or after the input that caused it. - trait StackLike[In, +Elem] extends AnyRef
Typeclass that perceives a subset of
In
values as either "stack push" or "stack pop" events.Typeclass that perceives a subset of
In
values as either "stack push" or "stack pop" events. For example, with XML, anElemStart
event can be perceived as a "stack push", and a correspondingElemEnd
event can be preceived as a "stack pop". - trait Transformer[-In, +Out] extends AnyRef
Primary "spac" abstraction which represents a transformation stage for a stream of data events
Primary "spac" abstraction which represents a transformation stage for a stream of data events
Transformers effectively transform a stream of
In
events into a stream ofOut
events. The actual stream handling logic is defined by aTransformer.Handler
, which aTransformer
is responsible for constructing. Handlers may be internally-mutable, and so they are generally only constructed by other handlers. Transformers themselves are immutable, acting as "handler factories", and so they may be freely reused.A transformer may choose to abort in response to any input event, as well as emit any number of outputs in response to an input event or the EOF signal.
- In
The incoming event type
- Out
The outgoing event type
- class TransformerApplyWithBoundInput[In] extends AnyRef
Convenience version of the
Transformer
companion object, which provides transformer constructors with theIn
type already specified. - trait TypeReduce[-In1, -In2] extends AnyRef
Type-level tuple reduction function that treats
Unit
as an Identity.Type-level tuple reduction function that treats
Unit
as an Identity. For example:TypeReduce[(Unit, Unit)]{ type Out = Unit } TypeReduce[(T, Unit)]{ type Out = T } TypeReduce[(Unit, T)]{ type Out = T } TypeReduce[(L, R)]{ type Out = (L, R) }
- trait Unconsable[C[_]] extends AnyRef
Typeclass for collections that can be efficiently split into a
head
element and atail
collection as long as they are not empty.
Value Members
- object CallerPos extends Serializable
- object ChunkSize extends Serializable
- object ContextLocation
- object ContextMatcher
- case object ContextPop extends ContextChange[Nothing, Nothing] with Product with Serializable
- object ContextTrace extends Serializable
- object Parsable
- object Parser
- object Signal
- object SingleItemContextMatcher
- object SpacException extends Serializable
- object SpacTraceElement
- object Splitter
- object StackInterpretation
- object Transformer
- object TypeReduce extends LowPriorityTypeReduceImplicits
- object Unconsable
Inherited from AnyRef
Inherited from Any
Main Concepts
All event consumers in SPaC are defined in terms of Parser
, Splitter
, and Transformer
.
Each of these three classes are interrelated, but with the eventual goal of producing one or more
interpreted values given an incoming stream of event data.
Error Handling
SPaC parsers should only ever throw SpacException
from their parse
and parseF
methods.
SpacException is a specialized exception type which uses "Spac Trace" elements instead of the usual
"Stack Trace"; these provide more useful information like what part of the parser failed, some
contextual information about what event caused the parser to fail.
Capturing Context Data
When dealing with tree-like documents, it is often important to be able to express a relative location in that data, or to produce some value based on the current location within the tree. SPaC refers to these locations as "context".
Utility and Supporting Classes
Most of these classes and traits are typeclasses that the primary types operate in terms of. Generally you don't directly interact with these.