Package org.apache.tika.sax
Class ToTextContentHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.sax.ToTextContentHandler
-
- All Implemented Interfaces:
org.xml.sax.ContentHandler
,org.xml.sax.DTDHandler
,org.xml.sax.EntityResolver
,org.xml.sax.ErrorHandler
- Direct Known Subclasses:
ToXMLContentHandler
public class ToTextContentHandler extends org.xml.sax.helpers.DefaultHandler
SAX event handler that writes all character content out to a character stream. No escaping or other transformations are made on the character content.As of Tika 1.20, this handler ignores content within <script> and <style> tags.
- Since:
- Apache Tika 0.10
-
-
Constructor Summary
Constructors Constructor Description ToTextContentHandler()
Creates a content handler that writes character events to an internal string buffer.ToTextContentHandler(java.io.OutputStream stream)
Creates a content handler that writes character events to the given output stream using the platform default encoding.ToTextContentHandler(java.io.OutputStream stream, java.lang.String encoding)
Creates a content handler that writes character events to the given output stream using the given encoding.ToTextContentHandler(java.io.Writer writer)
Creates a content handler that writes character events to the given writer.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
characters(char[] ch, int start, int length)
Writes the given characters to the given character stream.void
endDocument()
Flushes the character stream so that no characters are forgotten in internal buffers.void
endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
void
ignorableWhitespace(char[] ch, int start, int length)
Writes the given ignorable characters to the given character stream.void
startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)
java.lang.String
toString()
Returns the contents of the internal string buffer where all the received characters have been collected.
-
-
-
Constructor Detail
-
ToTextContentHandler
public ToTextContentHandler(java.io.Writer writer)
Creates a content handler that writes character events to the given writer.- Parameters:
writer
- writer
-
ToTextContentHandler
public ToTextContentHandler(java.io.OutputStream stream)
Creates a content handler that writes character events to the given output stream using the platform default encoding.- Parameters:
stream
- output stream
-
ToTextContentHandler
public ToTextContentHandler(java.io.OutputStream stream, java.lang.String encoding) throws java.io.UnsupportedEncodingException
Creates a content handler that writes character events to the given output stream using the given encoding.- Parameters:
stream
- output streamencoding
- output encoding- Throws:
java.io.UnsupportedEncodingException
- if the encoding is unsupported
-
ToTextContentHandler
public ToTextContentHandler()
Creates a content handler that writes character events to an internal string buffer. Use thetoString()
method to access the collected character content.
-
-
Method Detail
-
characters
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException
Writes the given characters to the given character stream.- Specified by:
characters
in interfaceorg.xml.sax.ContentHandler
- Overrides:
characters
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws org.xml.sax.SAXException
Writes the given ignorable characters to the given character stream. The default implementation simply forwards the call to thecharacters(char[], int, int)
method.- Specified by:
ignorableWhitespace
in interfaceorg.xml.sax.ContentHandler
- Overrides:
ignorableWhitespace
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
endDocument
public void endDocument() throws org.xml.sax.SAXException
Flushes the character stream so that no characters are forgotten in internal buffers.- Specified by:
endDocument
in interfaceorg.xml.sax.ContentHandler
- Overrides:
endDocument
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
- if the stream can not be flushed- See Also:
- TIKA-179
-
startElement
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException
- Specified by:
startElement
in interfaceorg.xml.sax.ContentHandler
- Overrides:
startElement
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
endElement
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws org.xml.sax.SAXException
- Specified by:
endElement
in interfaceorg.xml.sax.ContentHandler
- Overrides:
endElement
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
toString
public java.lang.String toString()
Returns the contents of the internal string buffer where all the received characters have been collected. Only works when this object was constructed using the empty default constructor or by passing aStringWriter
to the other constructor.- Overrides:
toString
in classjava.lang.Object
-
-