Class PublisherTextExtractor

    • Constructor Detail

      • PublisherTextExtractor

        public PublisherTextExtractor​(HPBFDocument doc)
      • PublisherTextExtractor

        public PublisherTextExtractor​(DirectoryNode dir)
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • PublisherTextExtractor

        public PublisherTextExtractor​(POIFSFileSystem fs)
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • PublisherTextExtractor

        public PublisherTextExtractor​(java.io.InputStream is)
                               throws java.io.IOException
        Throws:
        java.io.IOException
    • Method Detail

      • setHyperlinksByDefault

        public void setHyperlinksByDefault​(boolean hyperlinksByDefault)
        Should a call to getText() return hyperlinks inline with the text? Default is no
      • getText

        public java.lang.String getText()
        Description copied from interface: POITextExtractor
        Retrieves all the text from the document. How cells, paragraphs etc are separated in the text is implementation specific - see the javadocs for a specific project for details.
        Specified by:
        getText in interface POITextExtractor
        Returns:
        All the text from the document