Class XWPFWordExtractor

    • Field Detail

      • SUPPORTED_TYPES

        public static final java.util.List<XWPFRelation> SUPPORTED_TYPES
    • Constructor Detail

      • XWPFWordExtractor

        public XWPFWordExtractor​(OPCPackage container)
                          throws java.io.IOException
        Throws:
        java.io.IOException
      • XWPFWordExtractor

        public XWPFWordExtractor​(XWPFDocument document)
    • Method Detail

      • setFetchHyperlinks

        public void setFetchHyperlinks​(boolean fetch)
        Should we also fetch the hyperlinks, when fetching the text content? Default is to only output the hyperlink label, and not the contents
      • setConcatenatePhoneticRuns

        public void setConcatenatePhoneticRuns​(boolean concatenatePhoneticRuns)
        Should we concatenate phonetic runs in extraction. Default is true
        Parameters:
        concatenatePhoneticRuns - If phonetic runs should be concatenated
      • getText

        public java.lang.String getText()
        Description copied from interface: POITextExtractor
        Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.
        Specified by:
        getText in interface POITextExtractor
        Returns:
        All the text from the document
      • appendBodyElementText

        public void appendBodyElementText​(java.lang.StringBuilder text,
                                          IBodyElement e)
      • appendParagraphText

        public void appendParagraphText​(java.lang.StringBuilder text,
                                        XWPFParagraph paragraph)