Package org.apache.poi.xwpf.extractor
Class XWPFWordExtractor
- java.lang.Object
-
- org.apache.poi.xwpf.extractor.XWPFWordExtractor
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,POITextExtractor
,POIXMLTextExtractor
public class XWPFWordExtractor extends java.lang.Object implements POIXMLTextExtractor
Helper class to extract text from an OOXML Word file
-
-
Field Summary
Fields Modifier and Type Field Description static java.util.List<XWPFRelation>
SUPPORTED_TYPES
-
Constructor Summary
Constructors Constructor Description XWPFWordExtractor(OPCPackage container)
XWPFWordExtractor(XWPFDocument document)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
appendBodyElementText(java.lang.StringBuilder text, IBodyElement e)
void
appendParagraphText(java.lang.StringBuilder text, XWPFParagraph paragraph)
XWPFDocument
getDocument()
Returns opened documentXWPFDocument
getFilesystem()
java.lang.String
getText()
Retrieves all the text from the document.boolean
isCloseFilesystem()
void
setCloseFilesystem(boolean doCloseFilesystem)
void
setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
Should we concatenate phonetic runs in extraction.void
setFetchHyperlinks(boolean fetch)
Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents-
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.poi.ooxml.extractor.POIXMLTextExtractor
checkMaxTextSize, close, getCoreProperties, getCustomProperties, getExtendedProperties, getMetadataTextExtractor, getPackage
-
-
-
-
Field Detail
-
SUPPORTED_TYPES
public static final java.util.List<XWPFRelation> SUPPORTED_TYPES
-
-
Constructor Detail
-
XWPFWordExtractor
public XWPFWordExtractor(OPCPackage container) throws java.io.IOException
- Throws:
java.io.IOException
-
XWPFWordExtractor
public XWPFWordExtractor(XWPFDocument document)
-
-
Method Detail
-
setFetchHyperlinks
public void setFetchHyperlinks(boolean fetch)
Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents
-
setConcatenatePhoneticRuns
public void setConcatenatePhoneticRuns(boolean concatenatePhoneticRuns)
Should we concatenate phonetic runs in extraction. Default istrue
- Parameters:
concatenatePhoneticRuns
- If phonetic runs should be concatenated
-
getText
public java.lang.String getText()
Description copied from interface:POITextExtractor
Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.- Specified by:
getText
in interfacePOITextExtractor
- Returns:
- All the text from the document
-
appendBodyElementText
public void appendBodyElementText(java.lang.StringBuilder text, IBodyElement e)
-
appendParagraphText
public void appendParagraphText(java.lang.StringBuilder text, XWPFParagraph paragraph)
-
getDocument
public XWPFDocument getDocument()
Description copied from interface:POIXMLTextExtractor
Returns opened document- Specified by:
getDocument
in interfacePOITextExtractor
- Specified by:
getDocument
in interfacePOIXMLTextExtractor
- Returns:
- the opened document
-
setCloseFilesystem
public void setCloseFilesystem(boolean doCloseFilesystem)
- Specified by:
setCloseFilesystem
in interfacePOITextExtractor
- Parameters:
doCloseFilesystem
-true
(default), if underlying resources/filesystem should be closed onPOITextExtractor.close()
-
isCloseFilesystem
public boolean isCloseFilesystem()
- Specified by:
isCloseFilesystem
in interfacePOITextExtractor
- Returns:
true
, if resources/filesystem should be closed onPOITextExtractor.close()
-
getFilesystem
public XWPFDocument getFilesystem()
- Specified by:
getFilesystem
in interfacePOITextExtractor
- Returns:
- The underlying resources/filesystem
-
-