Instance Constructors
-
new
DocumentParserUsingSax(parserFactory: SAXParserFactory, parserCreator: (SAXParserFactory) ⇒ SAXParser, handlerCreator: () ⇒ ElemProducingSaxHandler)
Value Members
-
final
def
!=(arg0: AnyRef): Boolean
-
final
def
!=(arg0: Any): Boolean
-
final
def
##(): Int
-
final
def
==(arg0: AnyRef): Boolean
-
final
def
==(arg0: Any): Boolean
-
final
def
asInstanceOf[T0]: T0
-
def
clone(): AnyRef
-
final
def
eq(arg0: AnyRef): Boolean
-
def
equals(arg0: Any): Boolean
-
def
finalize(): Unit
-
final
def
getClass(): java.lang.Class[_]
-
-
def
hashCode(): Int
-
final
def
isInstanceOf[T0]: Boolean
-
final
def
ne(arg0: AnyRef): Boolean
-
final
def
notify(): Unit
-
final
def
notifyAll(): Unit
-
def
parse(inputStream: InputStream): Document
-
final
def
parse(file: File): Document
-
final
def
parse(uri: URI): Document
-
val
parserCreator: (SAXParserFactory) ⇒ SAXParser
-
val
parserFactory: SAXParserFactory
-
final
def
synchronized[T0](arg0: ⇒ T0): T0
-
def
toString(): String
-
final
def
wait(): Unit
-
final
def
wait(arg0: Long, arg1: Int): Unit
-
final
def
wait(arg0: Long): Unit
Inherited from AnyRef
Inherited from Any
SAX-based
Document
parser.Typical non-trivial creation is as follows, assuming a trait
MyEntityResolver
, which extendsEntityResolver
, and a traitMyErrorHandler
, which extendsErrorHandler
:If we want the
SAXParserFactory
to be a validating one, using an XML Schema, we could obtain theSAXParserFactory
as follows:A custom
EntityResolver
could be used to retrieve DTDs locally, or even to suppress DTD resolution. The latter can be coded as follows (see http://stuartsierra.com/2008/05/08/stop-your-java-sax-parser-from-downloading-dtds), risking some loss of information:For completeness, a custom
ErrorHandler
trait that simply prints parse exceptions to standard output:It is even possible to parse HTML (including very poor HTML) into well-formed Documents by using a
SAXParserFactory
from the TagSoup library. For example:If more flexibility is needed in configuring the
DocumentParser
than offered by this class, consider writing a wrapperDocumentParser
which wraps aDocumentParserUsingSax
, but adapts theparse
method. This would make it possible to set additional properties on the XML Reader, for example.As can be seen above, parsing is based on the JAXP
SAXParserFactory
instead of the SAX 2.0XMLReaderFactory
.A
DocumentParserUsingSax
instance can be re-used multiple times, from the same thread. If theSAXParserFactory
is thread-safe, it can even be re-used from multiple threads. Typically aSAXParserFactory
cannot be trusted to be thread-safe, however. In a web application, one (safe) way to deal with that is to use oneSAXParserFactory
instance per request.