Package org.apache.tika.sax
Class SafeContentHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.ContentHandlerDecorator
-
- org.apache.tika.sax.SafeContentHandler
-
- All Implemented Interfaces:
ContentHandler
,DTDHandler
,EntityResolver
,ErrorHandler
- Direct Known Subclasses:
XHTMLContentHandler
,XMPContentHandler
public class SafeContentHandler extends ContentHandlerDecorator
Content handler decorator that makes sure that the character events (characters(char[], int, int)
orignorableWhitespace(char[], int, int)
) passed to the decorated content handler contain only valid XML characters. All invalid characters are replaced with the Unicode replacement character U+FFFD (though a subclass may change this by overriding thewriteReplacement(Output)
method).The XML standard defines the following Unicode character ranges as valid XML characters:
#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
Note that currently this class only detects those invalid characters whose UTF-16 representation fits a single char. Also, this class does not ensure that the UTF-16 encoding of incoming characters is correct.
-
-
Constructor Summary
Constructors Constructor Description SafeContentHandler(ContentHandler handler)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
characters(char[] ch, int start, int length)
void
endDocument()
void
endElement(String uri, String localName, String name)
void
ignorableWhitespace(char[] ch, int start, int length)
void
startElement(String uri, String localName, String name, Attributes atts)
-
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endPrefixMapping, processingInstruction, setDocumentLocator, skippedEntity, startDocument, startPrefixMapping, toString
-
Methods inherited from class org.xml.sax.helpers.DefaultHandler
error, fatalError, notationDecl, resolveEntity, unparsedEntityDecl, warning
-
-
-
-
Constructor Detail
-
SafeContentHandler
public SafeContentHandler(ContentHandler handler)
-
-
Method Detail
-
startElement
public void startElement(String uri, String localName, String name, Attributes atts) throws SAXException
- Specified by:
startElement
in interfaceContentHandler
- Overrides:
startElement
in classContentHandlerDecorator
- Throws:
SAXException
-
endElement
public void endElement(String uri, String localName, String name) throws SAXException
- Specified by:
endElement
in interfaceContentHandler
- Overrides:
endElement
in classContentHandlerDecorator
- Throws:
SAXException
-
endDocument
public void endDocument() throws SAXException
- Specified by:
endDocument
in interfaceContentHandler
- Overrides:
endDocument
in classContentHandlerDecorator
- Throws:
SAXException
-
characters
public void characters(char[] ch, int start, int length) throws SAXException
- Specified by:
characters
in interfaceContentHandler
- Overrides:
characters
in classContentHandlerDecorator
- Throws:
SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws SAXException
- Specified by:
ignorableWhitespace
in interfaceContentHandler
- Overrides:
ignorableWhitespace
in classContentHandlerDecorator
- Throws:
SAXException
-
-