Yaidom is yet another Scala immutable DOM-like XML API.
Yaidom is yet another Scala immutable DOM-like XML API. The best known Scala immutable DOM-like API is
the standard scala.xml API. It:
attempts to offer an XPath-like querying experience, thus somewhat blurring the distinction between nodes and node collections
lacks first-class support for XML namespaces
has limited (functional) update support
Yaidom takes a different approach, avoiding XPath-like query support in its query API, and offering good namespace and decent (functional)
update support. Yaidom is also characterized by almost mathematical precision and clarity. Still, the API remains practical and
pragmatic. In particular, the API user has much configuration control over parsing and serialization, because yaidom exposes
the underlying JAXP parsers and serializers, which can be configured by the library user.
Yaidom chooses its battles. For example, given that DTDs do not know about namespaces, yaidom offers good namespace
support, but ignores DTDs entirely. Of course the underlying XML parser may still validate XML against a DTD, if so desired.
As another example, yaidom tries to leave the handling of the gory details of XML processing (such as whitespace handling)
as much as possible to JAXP (and JAXP parser/serializer configuration). As yet another example, yaidom knows nothing about
(XML Schema) types of elements and attributes.
As mentioned above, yaidom tries to treat basic XML processing with almost mathematical precision, even if this is "incorrect".
At the same time, yaidom tries to be useful in practice. For example, yaidom compromises "correctness" in the following ways:
Yaidom does not generally consider documents to be nodes (called "document information items" in the XML Infoset),
thus introducing fewer constraints on DOM-like node construction
Yaidom does not consider attributes to be (non-child) nodes (called "attribute information items" in the XML Infoset),
thus introducing fewer constraints on DOM-like node construction
Yaidom does not consider namespace declarations to be attributes, thus facilitating a clear theory of namespaces
Yaidom tries to keep the order of the attributes (for better round-tripping), although attribute order is irrelevant
according to the XML Infoset
Very importantly, yaidom clearly distinguishes between qualified names (QNames) and expanded names (ENames),
which is essential in facilitating a clear theory of namespaces
Qualified names occur in XML, whereas expanded names do not. Yet qualified names have no meaning on their own. They need
to be resolved to expanded names, via the in-scope namespaces. Note that the term "qualified name" is often used for what
yaidom (and the Namespaces specification) calls "expanded name", and that most XML APIs do not distinguish between the
2 kinds of names. Yaidom has to clearly make this distinction, in order to model namespaces correctly.
To resolve qualified names to expanded names, yaidom distinguishes between:
Namespace declarations occur in XML, whereas in-scope namespaces do not. The latter are the accumulated effect of the
namespace declarations of the element itself, if any, and those in ancestor elements.
Note: in the code examples below, we assume the following import:
import eu.cdevreeze.yaidom.core._
To see the resolution of qualified names in action, consider the following sample XML:
Consider the last element with qualified name QName("book:Book"). To resolve this qualified name as expanded name, we need to
know the namespaces in scope at that element. To compute the in-scope namespaces, we need to accumulate the namespace
declarations of the last book:Book element and of its ancestor element(s), starting with the root element.
The start Scope is "parent scope" Scope.Empty. Then, in the root element we find namespace declarations:
We find no other namespace declarations in the last book:Book element or its ancestor(s), so the computed scope is also the scope
of the last book:Book element.
This namespace support in yaidom has mathematical rigor. The immutable classes QName, EName, Declarations and Scope have
precise definitions, reflected in their implementations, and they obey some interesting properties. For example, if we correctly
define Scope operation relativize (along with resolve), we get:
This may not sound like much, but by getting the basics right, yaidom succeeds in offering first-class support for XML
namespaces, without the magic and namespace-related bugs often found in other XML libraries.
There are 2 other basic concepts in this package, representing paths to elements:
Using the Scope mentioned earlier, the latter path builder resolves to the path given before that, by
invoking method PathBuilder.build(scope). In order for this to work, the Scope must be invertible. That is,
there must be a one-to-one correspondence between prefixes ("" for the default namespace) and namespace URIs, because
otherwise the index numbers may differ. Also note that the prefixes book and auth in the path builder are
arbitrary, and need not match with the prefixes used in the XML tree itself.
Uniform query API traits
Yaidom provides a relatively small query API, to query an individual element for collections of child elements,
descendant elements or descendant-or-self elements. The resulting collections are immutable Scala
collections, that can further be manipulated using the Scala Collections API.
This query API is uniform, in that different element implementations share (most of) the same query API. It is also
element-centric (unlike standard Scala XML).
For example, consider the XML example given earlier, as a Scala XML literal named bookstore. We can wrap this Scala
XML Elem into a yaidom wrapper of type ScalaXmlElem, named bookstoreElem. Then we can query
for all books, that is, all descendant-or-self elements with resolved (or expanded) name EName("{http://bookstore/book}Book"),
as follows:
with the same result, due to an implicit conversion from expanded names to element predicates.
Instead of searching for appropriate descendant-or-self elements, we could have searched for descendant elements only,
without altering the result in this case:
for {
bookElem <- bookstoreElem \ EName("{http://bookstore/book}Book")
if (bookElem \@ EName("ISBN")).contains("978-0981531649")
authorElem <- bookElem \\ EName("{http://bookstore/author}Author")
} yield authorElem
where \\ stands for filterElemsOrSelf.
There is no explicit support for filtering on the "self" element itself. In the example above, we might want to check if
the root element has the expected EName, for instance. That is easy to express using a simple idiom, however. The last
example then becomes:
for {
bookstoreElem <- Vector(bookstoreElem)
if bookstoreElem.resolvedName == EName("{http://bookstore/book}Bookstore")
bookElem <- bookstoreElem \ EName("{http://bookstore/book}Book")
if (bookElem \@ EName("ISBN")).contains("978-0981531649")
authorElem <- bookElem \\ EName("{http://bookstore/author}Author")
} yield authorElem
Now suppose the same XML is stored in a (org.w3c.dom) DOM tree, wrapped in a DomElembookstoreElem.
Then the same queries would use exactly the same code as above! The result would be a collection of DomElem instances
instead of ScalaXmlElem instances, however. There are many more element implementations in yaidom, and they share
(most of) the same query API. Therefore this query API is called a uniform query API.
The last example, using operator notation, looks a bit more "XPath-like". It is more verbose than queries in Scala XML, however,
partly because in yaidom these operators cannot be chained. Yet this is with good reason. Yaidom does not blur the
distinction between elements and element collections, and therefore does not offer any XPath experience. The small price
paid in verbosity is made up for by precision. The yaidom query API traits have very precise definitions of their
operations, as can be seen in the corresponding documentation.
The uniform query API traits turn minimal APIs into richer APIs, where each richer API is defined very precisely in terms
of the minimal API. The most important (partly concrete) query API trait is eu.cdevreeze.yaidom.queryapi.ElemLike. It needs to be given
a method implementation to query for child elements (not child nodes in general, but just child elements!), and it offers methods to query
for some or all child elements, descendant elements, and descendant-or-self elements. That is, the minimal API consists
of abstract method findAllChildElems, and it offers methods such as filterChildElems, filterElems and filterElemsOrSelf.
This trait has no knowledge about elements at all, other than the fact that elements can have child elements.
Trait eu.cdevreeze.yaidom.queryapi.HasEName needs minimal knowledge about elements themselves, viz. that elements have a
"resolved" (or expanded) name, and "resolved" attributes (mapping attribute expanded names to attribute values). That is,
it needs to be given implementations of abstract methods resolvedName and resolvedAttributes, and then offers methods to
query for individual attributes or the local name of the element.
It is important to note that yaidom does not consider namespace declarations to be attributes themselves. Otherwise, there would
have been circular dependencies between both concepts, because attributes with namespaces require in-scope namespaces and therefore
namespace declarations for resolving the names of these attributes.
Note that trait eu.cdevreeze.yaidom.queryapi.ElemLike only knows about elements, not about other kinds of nodes.
Of course the actual element implementations mixing in this query API know about other node types, but that knowledge is outside
the uniform query API. Note that the example queries above only use the minimal element knowledge that traits ElemLike and HasEName
together have about elements. Therefore the query code can be used unchanged for different element implementations.
Trait eu.cdevreeze.yaidom.queryapi.UpdatableElemLike (which extends trait IsNavigable) offers functional updates
at given paths. Whereas the traits mentioned above know only about elements, this trait knows that elements have some node
super-type.
Instead of functional updates at given paths, elements can also be "transformed" functionally without specifying
any paths. This is offered by trait eu.cdevreeze.yaidom.queryapi.TransformableElemLike. The Scala XML and DOM wrappers above do
not mix in this trait.
Three uniform query API levels
Above, several individual query API traits were mentioned. There are, however, 3 query API levels
which are interesting for those who extend yaidom with new element implementations, but also for most users
of the yaidom query API. These levels are represented by "combination traits" that combine several
of the query API traits mentioned (or not mentioned) above.
All element implementation directly or indirectly implement the ClarkNodes.Elem trait. The part of
the yaidom query API that knows about ElemApi querying and about ENames is the ClarkNodes query
API level. It does not know about QNames, in-scope namespaces, ancestor elements, base URIs, etc.
The next level is eu.cdevreeze.yaidom.queryapi.ScopedNodes.Elem. It extends the ClarkNodes.Elem
trait, but offers knowledge about QNames and in-scope namespaces as well. Many element implementations
offer at least this query API level. The remarks about non-element nodes above also apply here, and below.
The third level is eu.cdevreeze.yaidom.queryapi.BackingNodes.Elem. It extends the ScopedNodes.Elem
trait, but offers knowledge about ancestor elements and document/base URIs as well. This is the level
typically used for "backing elements" in "yaidom dialects", thus allowing for multiple "XML backends"
to be used behind "yaidom dialects". Yaidom dialects are specific "XML dialect" type-safe yaidom query APIs,
mixing in and leveraging trait eu.cdevreeze.yaidom.queryapi.SubtypeAwareElemApi (often in combination
with eu.cdevreeze.yaidom.queryapi.ScopedNodes.Elem).
Class eu.cdevreeze.yaidom.simple.Elem is the default element implementation of yaidom. It extends class eu.cdevreeze.yaidom.simple.Node.
The latter also has sub-classes for text nodes, comments, entity references and processing instructions. Class eu.cdevreeze.yaidom.simple.Document
contains a document Elem, but is not a Node sub-class itself. This node hierarchy offers the ScopedNodes query API,
so simple elements offer the ScopedNodes.Elem query API.
Besides the element name, attributes and child nodes, it keeps a Scope, but no Declarations
This makes it easy to compose these elements, as long as scopes are passed explicitly throughout the element tree
Equality is reference equality, because it is hard to come up with a sensible equality for this element class
Roundtripping cannot be entirely lossless, but this class does try to retain the attribute order (although irrelevant according to XML Infoset)
Packages parse and print offer DocumentParser and DocumentPrinter classes for parsing/serializing these default
Elem (and Document) instances
Creating such Elem trees by hand is a bit cumbersome, partly because scopes have to be passed to each Elem in the tree.
The latter is not needed if we use class eu.cdevreeze.yaidom.simple.ElemBuilder to create element trees by hand. When the tree
has been fully created as ElemBuilder, invoke method ElemBuilder.build(parentScope) to turn it into an Elem.
Like their super-classes Node and NodeBuilder, classes Elem and ElemBuilder have very much in common. Both are immutable,
easy to compose (ElemBuilder instances even more so), equality is reference equality, etc. The most important differences
are as follows:
Instead of a Scope, an ElemBuilder contains a Declarations
This makes an ElemBuilder easier to compose than an Elem, because no Scope needs to be passed around throughout the tree
Class ElemBuilder uses a minimal query API, mixing in almost only traits ElemLike and TransformableElemLike
After all, an ElemBuilder neither keeps nor knows about Scopes, so does not know about resolved element/attribute names
The Effective Java book element in the XML example above could have been written as ElemBuilder (without the inter-element whitespace) as follows:
Note that the distinction between ElemBuilder and Elem "solves" the mismatch that immutable ("functional") element trees are
constructed in a bottom-up manner, while namespace scoping works in a top-down manner. (See also Anti-XML issue 78, in
https://github.com/djspiewak/anti-xml/issues/78).
There are many more element implementations in yaidom, most of them in sub-packages of this package. Yaidom is extensible
in that new element implementations can be invented, for example elements that are better "roundtrippable" (at the expense of
"composability"), or yaidom wrappers around other DOM-like APIs (such as XOM or JDOM2). The current element implementations
in yaidom are for example:
Immutable class eu.cdevreeze.yaidom.resolved.Elem, which takes namespace prefixes out of the equation, and therefore
makes useful (namespace-aware) equality comparisons feasible. It offers the ClarkNodes.Elem query API (as well as
update/transformation support).
Immutable class eu.cdevreeze.yaidom.indexed.Elem, which offers views on default Elems that know the ancestry of
each element. It offers the BackingNodes.Elem query API, so knows its ancestry, despite being immutable! This element implementation
is handy for querying XML schemas, for example, because in schemas the ancestry of queried elements typically matters.
One yaidom wrapper that is very useful is a Saxon tiny tree yaidom wrapper, namely SaxonElem (JVM-only).
Like "indexed elements", it offers all of the BackingNodes.Elem query API. This element implementation is very efficient,
especially in memory footprint (when using the default tree model, namely tiny trees). It is therefore the most attractive element
implementation to use in "enterprise" production code, but only on the JVM. In combination with Saxon-EE (instead of Saxon-HE) the underlying
Saxon NodeInfo objects can even carry interesting type information.
For ad-hoc element creation, consider using "resolved" elements. They are easy to create, because there is no need to worry about
namespace prefixes. Once created, they can be converted to "simple" elements, given an appropriate Scope (without default namespace).
Packages and dependencies
Yaidom has the following packages, and layering between packages (mentioning the lowest layers first):
Package eu.cdevreeze.yaidom.core, with the core concepts described above. It depends on no other yaidom packages.
Package eu.cdevreeze.yaidom.queryapi, with the query API traits described above. It only depends on the core package.
Package eu.cdevreeze.yaidom.resolved, with a minimal "James Clark" element implementation. It only depends on the core and
queryapi packages.
Package eu.cdevreeze.yaidom.simple, with the default element implementation described above. It only depends on the core and queryapi
packages.
Package eu.cdevreeze.yaidom.indexed, supporting "indexed" elements. It only depends on the core, queryapi and simple
packages.
Package convert. It contains conversions between default yaidom nodes on the one hand and DOM,
Scala XML, etc. on the other hand. The convert package depends on the yaidom core, queryapi, resolved and simple packages.
Package eu.cdevreeze.yaidom.saxon, with the Saxon wrapper element implementation described above. It only depends on the core, queryapi
and convert packages.
Packages eu.cdevreeze.yaidom.parse and eu.cdevreeze.yaidom.print, for parsing/printing Elems. They depend on
the packages mentioned above, except for indexed and saxon.
The other packages (except utils), such as dom and scalaxml. They depend on (some of) the packages mentioned above,
but not on each other.
Indeed, all yaidom package dependencies are uni-directional.
Notes on performance
Yaidom can be quite memory-hungry. One particular cause of that is the possible creation of very many duplicate EName and
QName instances. This can be the case while parsing XML into yaidom documents, or while querying yaidom element trees.
The user of the library can reduce memory consumption to a large extent, and yaidom facilitates that.
Note that the global ENameProvider or QNameProvider can typically be configured rather late during development, but the
memory cost savings can be substantial. Also note that the global ENameProvider or QNameProvider can be used implicitly in
application code, by writing:
using an implicit ENameProvider, whose members are in scope. Still, for querying the first alternative using withEName is
better, but there are likely many scenarios in yaidom client code where an implicit ENameProvider or QNameProvider makes sense.
The bottom line is that yaidom can be configured to be far less memory-hungry, and that yaidom client code can also take
some responsibility in reducing memory usage. Again, the Saxon wrapper implementation is an excellent and efficient choice (but only on the JVM).
Yaidom is yet another Scala immutable DOM-like XML API. The best known Scala immutable DOM-like API is the standard scala.xml API. It:
Yaidom takes a different approach, avoiding XPath-like query support in its query API, and offering good namespace and decent (functional) update support. Yaidom is also characterized by almost mathematical precision and clarity. Still, the API remains practical and pragmatic. In particular, the API user has much configuration control over parsing and serialization, because yaidom exposes the underlying JAXP parsers and serializers, which can be configured by the library user.
Yaidom chooses its battles. For example, given that DTDs do not know about namespaces, yaidom offers good namespace support, but ignores DTDs entirely. Of course the underlying XML parser may still validate XML against a DTD, if so desired. As another example, yaidom tries to leave the handling of the gory details of XML processing (such as whitespace handling) as much as possible to JAXP (and JAXP parser/serializer configuration). As yet another example, yaidom knows nothing about (XML Schema) types of elements and attributes.
As mentioned above, yaidom tries to treat basic XML processing with almost mathematical precision, even if this is "incorrect". At the same time, yaidom tries to be useful in practice. For example, yaidom compromises "correctness" in the following ways:
Yaidom, and in particular the eu.cdevreeze.yaidom.core, eu.cdevreeze.yaidom.queryapi, eu.cdevreeze.yaidom.resolved and eu.cdevreeze.yaidom.simple sub-packages, contains the following layers:
core
package)queryapi
package)resolved
andsimple
packages)It makes sense to read this documentation, because it helps in getting up-to-speed with yaidom.
Basic concepts
In real world XML, elements (and sometimes attributes) tend to have names within a certain namespace. There are 2 kinds of names at play here:
book:Title
, and unprefixed names, such asEdition
{http://bookstore/book}Title
(in James Clark notation), and not having a namespace, such asEdition
They are represented by immutable classes eu.cdevreeze.yaidom.core.QName and eu.cdevreeze.yaidom.core.EName, respectively.
Qualified names occur in XML, whereas expanded names do not. Yet qualified names have no meaning on their own. They need to be resolved to expanded names, via the in-scope namespaces. Note that the term "qualified name" is often used for what yaidom (and the Namespaces specification) calls "expanded name", and that most XML APIs do not distinguish between the 2 kinds of names. Yaidom has to clearly make this distinction, in order to model namespaces correctly.
To resolve qualified names to expanded names, yaidom distinguishes between:
They are represented by immutable classes eu.cdevreeze.yaidom.core.Declarations and eu.cdevreeze.yaidom.core.Scope, respectively.
Namespace declarations occur in XML, whereas in-scope namespaces do not. The latter are the accumulated effect of the namespace declarations of the element itself, if any, and those in ancestor elements.
Note: in the code examples below, we assume the following import:
import eu.cdevreeze.yaidom.core._
To see the resolution of qualified names in action, consider the following sample XML:
Consider the last element with qualified name
QName("book:Book")
. To resolve this qualified name as expanded name, we need to know the namespaces in scope at that element. To compute the in-scope namespaces, we need to accumulate the namespace declarations of the lastbook:Book
element and of its ancestor element(s), starting with the root element.The start Scope is "parent scope"
Scope.Empty
. Then, in the root element we find namespace declarations:This leads to the following namespaces in scope at the root element:
which is equal to:
We find no other namespace declarations in the last
book:Book
element or its ancestor(s), so the computed scope is also the scope of the lastbook:Book
element.Then
QName("book:Book")
is resolved as follows:which is equal to:
This namespace support in yaidom has mathematical rigor. The immutable classes
QName
,EName
,Declarations
andScope
have precise definitions, reflected in their implementations, and they obey some interesting properties. For example, if we correctly define Scope operationrelativize
(along withresolve
), we get:This may not sound like much, but by getting the basics right, yaidom succeeds in offering first-class support for XML namespaces, without the magic and namespace-related bugs often found in other XML libraries.
There are 2 other basic concepts in this package, representing paths to elements:
They are represented by immutable classes eu.cdevreeze.yaidom.core.PathBuilder and eu.cdevreeze.yaidom.core.Path, respectively.
Path builders are like canonical XPath expressions, yet they do not contain the root element itself, and indexing starts with 0 instead of 1.
For example, the last name of the first author of the last book element has path:
This path could be written as path builder as follows:
Using the Scope mentioned earlier, the latter path builder resolves to the path given before that, by invoking method
PathBuilder.build(scope)
. In order for this to work, the Scope must be invertible. That is, there must be a one-to-one correspondence between prefixes ("" for the default namespace) and namespace URIs, because otherwise the index numbers may differ. Also note that the prefixesbook
andauth
in the path builder are arbitrary, and need not match with the prefixes used in the XML tree itself.Uniform query API traits
Yaidom provides a relatively small query API, to query an individual element for collections of child elements, descendant elements or descendant-or-self elements. The resulting collections are immutable Scala collections, that can further be manipulated using the Scala Collections API.
This query API is uniform, in that different element implementations share (most of) the same query API. It is also element-centric (unlike standard Scala XML).
For example, consider the XML example given earlier, as a Scala XML literal named
bookstore
. We can wrap this Scala XML Elem into a yaidom wrapper of typeScalaXmlElem
, namedbookstoreElem
. Then we can query for all books, that is, all descendant-or-self elements with resolved (or expanded) nameEName("{http://bookstore/book}Book")
, as follows:The result would be an immutable IndexedSeq of
ScalaXmlElem
instances, holding 2 book elements.We could instead have written:
bookstoreElem.filterElemsOrSelf(EName("{http://bookstore/book}Book"))
with the same result, due to an implicit conversion from expanded names to element predicates.
Instead of searching for appropriate descendant-or-self elements, we could have searched for descendant elements only, without altering the result in this case:
or:
bookstoreElem.filterElems(EName("{http://bookstore/book}Book"))
We could even have searched for appropriate child elements only, without altering the result in this case:
or:
bookstoreElem.filterChildElems(EName("{http://bookstore/book}Book"))
or, knowing that all child elements are books:
We could find all authors of the Scala book as follows:
or:
We could even use operator notation, as follows:
or:
where
\\
stands forfilterElemsOrSelf
.There is no explicit support for filtering on the "self" element itself. In the example above, we might want to check if the root element has the expected EName, for instance. That is easy to express using a simple idiom, however. The last example then becomes:
Now suppose the same XML is stored in a (org.w3c.dom) DOM tree, wrapped in a
DomElem
bookstoreElem
. Then the same queries would use exactly the same code as above! The result would be a collection ofDomElem
instances instead ofScalaXmlElem
instances, however. There are many more element implementations in yaidom, and they share (most of) the same query API. Therefore this query API is called a uniform query API.The last example, using operator notation, looks a bit more "XPath-like". It is more verbose than queries in Scala XML, however, partly because in yaidom these operators cannot be chained. Yet this is with good reason. Yaidom does not blur the distinction between elements and element collections, and therefore does not offer any XPath experience. The small price paid in verbosity is made up for by precision. The yaidom query API traits have very precise definitions of their operations, as can be seen in the corresponding documentation.
The uniform query API traits turn minimal APIs into richer APIs, where each richer API is defined very precisely in terms of the minimal API. The most important (partly concrete) query API trait is eu.cdevreeze.yaidom.queryapi.ElemLike. It needs to be given a method implementation to query for child elements (not child nodes in general, but just child elements!), and it offers methods to query for some or all child elements, descendant elements, and descendant-or-self elements. That is, the minimal API consists of abstract method
findAllChildElems
, and it offers methods such asfilterChildElems
,filterElems
andfilterElemsOrSelf
. This trait has no knowledge about elements at all, other than the fact that elements can have child elements.Trait eu.cdevreeze.yaidom.queryapi.HasEName needs minimal knowledge about elements themselves, viz. that elements have a "resolved" (or expanded) name, and "resolved" attributes (mapping attribute expanded names to attribute values). That is, it needs to be given implementations of abstract methods
resolvedName
andresolvedAttributes
, and then offers methods to query for individual attributes or the local name of the element.It is important to note that yaidom does not consider namespace declarations to be attributes themselves. Otherwise, there would have been circular dependencies between both concepts, because attributes with namespaces require in-scope namespaces and therefore namespace declarations for resolving the names of these attributes.
Many traits, such as eu.cdevreeze.yaidom.queryapi.HasEName, are just "capabilities", and need to be combined with trait eu.cdevreeze.yaidom.queryapi.ElemLike in order to offer a useful element querying API.
Note that trait eu.cdevreeze.yaidom.queryapi.ElemLike only knows about elements, not about other kinds of nodes. Of course the actual element implementations mixing in this query API know about other node types, but that knowledge is outside the uniform query API. Note that the example queries above only use the minimal element knowledge that traits
ElemLike
andHasEName
together have about elements. Therefore the query code can be used unchanged for different element implementations.Trait eu.cdevreeze.yaidom.queryapi.IsNavigable is used to navigate to an element given a Path.
Trait eu.cdevreeze.yaidom.queryapi.UpdatableElemLike (which extends trait
IsNavigable
) offers functional updates at given paths. Whereas the traits mentioned above know only about elements, this trait knows that elements have some node super-type.Instead of functional updates at given paths, elements can also be "transformed" functionally without specifying any paths. This is offered by trait eu.cdevreeze.yaidom.queryapi.TransformableElemLike. The Scala XML and DOM wrappers above do not mix in this trait.
Three uniform query API levels
Above, several individual query API traits were mentioned. There are, however, 3 query API levels which are interesting for those who extend yaidom with new element implementations, but also for most users of the yaidom query API. These levels are represented by "combination traits" that combine several of the query API traits mentioned (or not mentioned) above.
The most basic level is eu.cdevreeze.yaidom.queryapi.ClarkNodes.Elem. It combines traits such as eu.cdevreeze.yaidom.queryapi.ElemApi and eu.cdevreeze.yaidom.queryapi.HasENameApi. Object eu.cdevreeze.yaidom.queryapi.ClarkNodes also contains types for non-element nodes. All element implementations that extend trait
ClarkNodes.Elem
should have a node hierarchy with all kinds of nodes extending the appropriateClarkNodes
member type.All element implementation directly or indirectly implement the
ClarkNodes.Elem
trait. The part of the yaidom query API that knows aboutElemApi
querying and about ENames is theClarkNodes
query API level. It does not know about QNames, in-scope namespaces, ancestor elements, base URIs, etc.The next level is eu.cdevreeze.yaidom.queryapi.ScopedNodes.Elem. It extends the
ClarkNodes.Elem
trait, but offers knowledge about QNames and in-scope namespaces as well. Many element implementations offer at least this query API level. The remarks about non-element nodes above also apply here, and below.The third level is eu.cdevreeze.yaidom.queryapi.BackingNodes.Elem. It extends the
ScopedNodes.Elem
trait, but offers knowledge about ancestor elements and document/base URIs as well. This is the level typically used for "backing elements" in "yaidom dialects", thus allowing for multiple "XML backends" to be used behind "yaidom dialects". Yaidom dialects are specific "XML dialect" type-safe yaidom query APIs, mixing in and leveraging trait eu.cdevreeze.yaidom.queryapi.SubtypeAwareElemApi (often in combination with eu.cdevreeze.yaidom.queryapi.ScopedNodes.Elem).To get to know the yaidom query API and its 3 levels, it pays off to study the API documentation of traits eu.cdevreeze.yaidom.queryapi.ClarkNodes.Elem, eu.cdevreeze.yaidom.queryapi.ScopedNodes.Elem and eu.cdevreeze.yaidom.queryapi.BackingNodes.Elem.
Some element implementations
In package
simple
there are 2 immutable element implementations, eu.cdevreeze.yaidom.simple.ElemBuilder and eu.cdevreeze.yaidom.simple.Elem. Arguably,ElemBuilder
is not an element implementation. Indeed, it does not even offer theClarkNodes.Elem
query API.Class eu.cdevreeze.yaidom.simple.Elem is the default element implementation of yaidom. It extends class eu.cdevreeze.yaidom.simple.Node. The latter also has sub-classes for text nodes, comments, entity references and processing instructions. Class eu.cdevreeze.yaidom.simple.Document contains a document
Elem
, but is not aNode
sub-class itself. This node hierarchy offers theScopedNodes
query API, so simple elements offer theScopedNodes.Elem
query API.The eu.cdevreeze.yaidom.simple.Elem class has the following characteristics:
Scope
, but noDeclarations
parse
andprint
offerDocumentParser
andDocumentPrinter
classes for parsing/serializing these defaultElem
(andDocument
) instancesCreating such
Elem
trees by hand is a bit cumbersome, partly because scopes have to be passed to eachElem
in the tree. The latter is not needed if we use class eu.cdevreeze.yaidom.simple.ElemBuilder to create element trees by hand. When the tree has been fully created asElemBuilder
, invoke methodElemBuilder.build(parentScope)
to turn it into anElem
.Like their super-classes
Node
andNodeBuilder
, classesElem
andElemBuilder
have very much in common. Both are immutable, easy to compose (ElemBuilder
instances even more so), equality is reference equality, etc. The most important differences are as follows:Scope
, anElemBuilder
contains aDeclarations
ElemBuilder
easier to compose than anElem
, because no Scope needs to be passed around throughout the treeElemBuilder
uses a minimal query API, mixing in almost only traitsElemLike
andTransformableElemLike
ElemBuilder
neither keeps nor knows about Scopes, so does not know about resolved element/attribute namesThe Effective Java book element in the XML example above could have been written as
ElemBuilder
(without the inter-element whitespace) as follows:This
ElemBuilder
(say,eb
) lacks namespace declarations for prefixesbook
andauth
. So, the following returnsfalse
:while the following returns
true
:Indeed,
returns the element tree as
Elem
.Note that the distinction between
ElemBuilder
andElem
"solves" the mismatch that immutable ("functional") element trees are constructed in a bottom-up manner, while namespace scoping works in a top-down manner. (See also Anti-XML issue 78, in https://github.com/djspiewak/anti-xml/issues/78).There are many more element implementations in yaidom, most of them in sub-packages of this package. Yaidom is extensible in that new element implementations can be invented, for example elements that are better "roundtrippable" (at the expense of "composability"), or yaidom wrappers around other DOM-like APIs (such as XOM or JDOM2). The current element implementations in yaidom are for example:
Elem
by hand. See above.ClarkNodes.Elem
query API (as well as update/transformation support).BackingNodes.Elem
query API, so knows its ancestry, despite being immutable! This element implementation is handy for querying XML schemas, for example, because in schemas the ancestry of queried elements typically matters.One yaidom wrapper that is very useful is a Saxon tiny tree yaidom wrapper, namely
SaxonElem
(JVM-only). Like "indexed elements", it offers all of theBackingNodes.Elem
query API. This element implementation is very efficient, especially in memory footprint (when using the default tree model, namely tiny trees). It is therefore the most attractive element implementation to use in "enterprise" production code, but only on the JVM. In combination with Saxon-EE (instead of Saxon-HE) the underlying SaxonNodeInfo
objects can even carry interesting type information.For ad-hoc element creation, consider using "resolved" elements. They are easy to create, because there is no need to worry about namespace prefixes. Once created, they can be converted to "simple" elements, given an appropriate
Scope
(without default namespace).Packages and dependencies
Yaidom has the following packages, and layering between packages (mentioning the lowest layers first):
core
package.core
andqueryapi
packages.core
andqueryapi
packages.core
,queryapi
andsimple
packages.convert
. It contains conversions between default yaidom nodes on the one hand and DOM, Scala XML, etc. on the other hand. Theconvert
package depends on the yaidomcore
,queryapi
,resolved
andsimple
packages.eu.cdevreeze.yaidom.saxon
, with the Saxon wrapper element implementation described above. It only depends on thecore
,queryapi
andconvert
packages.eu.cdevreeze.yaidom.parse
andeu.cdevreeze.yaidom.print
, for parsing/printing Elems. They depend on the packages mentioned above, except forindexed
andsaxon
.utils
), such asdom
andscalaxml
. They depend on (some of) the packages mentioned above, but not on each other.Indeed, all yaidom package dependencies are uni-directional.
Notes on performance
Yaidom can be quite memory-hungry. One particular cause of that is the possible creation of very many duplicate EName and QName instances. This can be the case while parsing XML into yaidom documents, or while querying yaidom element trees.
The user of the library can reduce memory consumption to a large extent, and yaidom facilitates that.
As for querying, prefer:
to:
to avoid unnecessary (large scale) EName object creation.
To reduce the memory footprint of parsed XML trees, see eu.cdevreeze.yaidom.core.ENameProvider and eu.cdevreeze.yaidom.core.QNameProvider.
For example, during the startup phase of an application, we could set the global ENameProvider as follows:
ENameProvider.globalENameProvider.become(new ENameProvider.ENameProviderUsingImmutableCache(knownENames))
Note that the global ENameProvider or QNameProvider can typically be configured rather late during development, but the memory cost savings can be substantial. Also note that the global ENameProvider or QNameProvider can be used implicitly in application code, by writing:
using an implicit ENameProvider, whose members are in scope. Still, for querying the first alternative using
withEName
is better, but there are likely many scenarios in yaidom client code where an implicit ENameProvider or QNameProvider makes sense.The bottom line is that yaidom can be configured to be far less memory-hungry, and that yaidom client code can also take some responsibility in reducing memory usage. Again, the Saxon wrapper implementation is an excellent and efficient choice (but only on the JVM).