Package org.apache.tika.sax
Class XHTMLContentHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.ContentHandlerDecorator
-
- org.apache.tika.sax.SafeContentHandler
-
- org.apache.tika.sax.XHTMLContentHandler
-
- All Implemented Interfaces:
org.xml.sax.ContentHandler
,org.xml.sax.DTDHandler
,org.xml.sax.EntityResolver
,org.xml.sax.ErrorHandler
public class XHTMLContentHandler extends SafeContentHandler
Content handler decorator that simplifies the task of producing XHTML events for Tika content parsers.
-
-
Constructor Summary
Constructors Constructor Description XHTMLContentHandler(org.xml.sax.ContentHandler handler, Metadata metadata)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
characters(char[] ch, int start, int length)
void
characters(java.lang.String characters)
void
element(java.lang.String name, java.lang.String value)
Emits an XHTML element with the given text content.void
endDocument()
Ends the XHTML document by writing the following footer and clearing the namespace mappings:void
endElement(java.lang.String name)
void
endElement(java.lang.String uri, java.lang.String local, java.lang.String name)
Ends the given element.void
newline()
void
startDocument()
Starts an XHTML document by setting up the namespace mappings when called for the first time.void
startElement(java.lang.String name)
void
startElement(java.lang.String name, java.lang.String attribute, java.lang.String value)
void
startElement(java.lang.String uri, java.lang.String local, java.lang.String name, org.xml.sax.Attributes attributes)
Starts the given element.void
startElement(java.lang.String name, org.xml.sax.helpers.AttributesImpl attributes)
-
Methods inherited from class org.apache.tika.sax.SafeContentHandler
ignorableWhitespace
-
Methods inherited from class org.apache.tika.sax.ContentHandlerDecorator
endPrefixMapping, processingInstruction, setDocumentLocator, skippedEntity, startPrefixMapping, toString
-
-
-
-
Field Detail
-
XHTML
public static final java.lang.String XHTML
The XHTML namespace URI- See Also:
- Constant Field Values
-
ENDLINE
public static final java.util.Set<java.lang.String> ENDLINE
The elements that get appended with theNL
character.
-
-
Constructor Detail
-
XHTMLContentHandler
public XHTMLContentHandler(org.xml.sax.ContentHandler handler, Metadata metadata)
-
-
Method Detail
-
startDocument
public void startDocument() throws org.xml.sax.SAXException
Starts an XHTML document by setting up the namespace mappings when called for the first time. The standard XHTML prefix is generated lazily when the first element is started.- Specified by:
startDocument
in interfaceorg.xml.sax.ContentHandler
- Overrides:
startDocument
in classContentHandlerDecorator
- Throws:
org.xml.sax.SAXException
-
endDocument
public void endDocument() throws org.xml.sax.SAXException
Ends the XHTML document by writing the following footer and clearing the namespace mappings:</body> </html>
- Specified by:
endDocument
in interfaceorg.xml.sax.ContentHandler
- Overrides:
endDocument
in classSafeContentHandler
- Throws:
org.xml.sax.SAXException
-
startElement
public void startElement(java.lang.String uri, java.lang.String local, java.lang.String name, org.xml.sax.Attributes attributes) throws org.xml.sax.SAXException
Starts the given element. Table cells and list items are automatically indented by emitting a tab character as ignorable whitespace.- Specified by:
startElement
in interfaceorg.xml.sax.ContentHandler
- Overrides:
startElement
in classSafeContentHandler
- Throws:
org.xml.sax.SAXException
-
endElement
public void endElement(java.lang.String uri, java.lang.String local, java.lang.String name) throws org.xml.sax.SAXException
Ends the given element. Block elements are automatically followed by a newline character.- Specified by:
endElement
in interfaceorg.xml.sax.ContentHandler
- Overrides:
endElement
in classSafeContentHandler
- Throws:
org.xml.sax.SAXException
-
characters
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException
- Specified by:
characters
in interfaceorg.xml.sax.ContentHandler
- Overrides:
characters
in classSafeContentHandler
- Throws:
org.xml.sax.SAXException
- See Also:
- TIKA-210
-
startElement
public void startElement(java.lang.String name) throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
-
startElement
public void startElement(java.lang.String name, java.lang.String attribute, java.lang.String value) throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
-
startElement
public void startElement(java.lang.String name, org.xml.sax.helpers.AttributesImpl attributes) throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
-
endElement
public void endElement(java.lang.String name) throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
-
characters
public void characters(java.lang.String characters) throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
-
newline
public void newline() throws org.xml.sax.SAXException
- Throws:
org.xml.sax.SAXException
-
element
public void element(java.lang.String name, java.lang.String value) throws org.xml.sax.SAXException
Emits an XHTML element with the given text content. If the given text value is null or empty, then the element is not written.- Parameters:
name
- XHTML element namevalue
- element value, possiblynull
- Throws:
org.xml.sax.SAXException
- if the content element could not be written
-
-