Class OOXMLWordAndPowerPointTextHandler
- java.lang.Object
-
- org.xml.sax.helpers.DefaultHandler
-
- org.apache.tika.parser.microsoft.ooxml.OOXMLWordAndPowerPointTextHandler
-
- All Implemented Interfaces:
org.xml.sax.ContentHandler
,org.xml.sax.DTDHandler
,org.xml.sax.EntityResolver
,org.xml.sax.ErrorHandler
public class OOXMLWordAndPowerPointTextHandler extends org.xml.sax.helpers.DefaultHandler
This class is intended to handle anything that might contain IBodyElements: main document, headers, footers, notes, slides, etc.This class does not generally check for namespaces, and it can be applied to PPTX and DOCX for text extraction.
TODO: move this into POI?
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
OOXMLWordAndPowerPointTextHandler.EditType
static interface
OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
-
Field Summary
Fields Modifier and Type Field Description static java.lang.String
W_NS
-
Constructor Summary
Constructors Constructor Description OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, java.util.Map<java.lang.String,java.lang.String> hyperlinks)
OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, java.util.Map<java.lang.String,java.lang.String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
characters(char[] ch, int start, int length)
void
endDocument()
void
endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName)
void
endPrefixMapping(java.lang.String prefix)
void
ignorableWhitespace(char[] ch, int start, int length)
void
startDocument()
void
startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts)
void
startPrefixMapping(java.lang.String prefix, java.lang.String uri)
-
-
-
Field Detail
-
W_NS
public static final java.lang.String W_NS
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
OOXMLWordAndPowerPointTextHandler
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, java.util.Map<java.lang.String,java.lang.String> hyperlinks)
-
OOXMLWordAndPowerPointTextHandler
public OOXMLWordAndPowerPointTextHandler(OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler bodyContentsHandler, java.util.Map<java.lang.String,java.lang.String> hyperlinks, boolean includeTextBox, boolean concatenatePhoneticRuns)
-
-
Method Detail
-
startDocument
public void startDocument() throws org.xml.sax.SAXException
- Specified by:
startDocument
in interfaceorg.xml.sax.ContentHandler
- Overrides:
startDocument
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
endDocument
public void endDocument() throws org.xml.sax.SAXException
- Specified by:
endDocument
in interfaceorg.xml.sax.ContentHandler
- Overrides:
endDocument
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
startPrefixMapping
public void startPrefixMapping(java.lang.String prefix, java.lang.String uri) throws org.xml.sax.SAXException
- Specified by:
startPrefixMapping
in interfaceorg.xml.sax.ContentHandler
- Overrides:
startPrefixMapping
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
endPrefixMapping
public void endPrefixMapping(java.lang.String prefix) throws org.xml.sax.SAXException
- Specified by:
endPrefixMapping
in interfaceorg.xml.sax.ContentHandler
- Overrides:
endPrefixMapping
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
startElement
public void startElement(java.lang.String uri, java.lang.String localName, java.lang.String qName, org.xml.sax.Attributes atts) throws org.xml.sax.SAXException
- Specified by:
startElement
in interfaceorg.xml.sax.ContentHandler
- Overrides:
startElement
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
endElement
public void endElement(java.lang.String uri, java.lang.String localName, java.lang.String qName) throws org.xml.sax.SAXException
- Specified by:
endElement
in interfaceorg.xml.sax.ContentHandler
- Overrides:
endElement
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
characters
public void characters(char[] ch, int start, int length) throws org.xml.sax.SAXException
- Specified by:
characters
in interfaceorg.xml.sax.ContentHandler
- Overrides:
characters
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
ignorableWhitespace
public void ignorableWhitespace(char[] ch, int start, int length) throws org.xml.sax.SAXException
- Specified by:
ignorableWhitespace
in interfaceorg.xml.sax.ContentHandler
- Overrides:
ignorableWhitespace
in classorg.xml.sax.helpers.DefaultHandler
- Throws:
org.xml.sax.SAXException
-
-