Class XSSFEventBasedExcelExtractor

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, ExcelExtractor
    Direct Known Subclasses:
    XSSFBEventBasedExcelExtractor

    public class XSSFEventBasedExcelExtractor
    extends org.apache.poi.ooxml.extractor.POIXMLTextExtractor
    implements ExcelExtractor
    Implementation of a text extractor from OOXML Excel files that uses SAX event based parsing.
    • Method Detail

      • main

        public static void main​(java.lang.String[] args)
                         throws java.lang.Exception
        Throws:
        java.lang.Exception
      • setIncludeSheetNames

        public void setIncludeSheetNames​(boolean includeSheetNames)
        Should sheet names be included? Default is true
        Specified by:
        setIncludeSheetNames in interface ExcelExtractor
        Parameters:
        includeSheetNames - true if the sheet names should be included
      • getIncludeSheetNames

        public boolean getIncludeSheetNames()
        Returns:
        whether to include sheet names
        Since:
        3.16-beta3
      • setFormulasNotResults

        public void setFormulasNotResults​(boolean formulasNotResults)
        Should we return the formula itself, and not the result it produces? Default is false
        Specified by:
        setFormulasNotResults in interface ExcelExtractor
        Parameters:
        formulasNotResults - true if the formula itself is returned
      • getFormulasNotResults

        public boolean getFormulasNotResults()
        Returns:
        whether to include formulas but not results
        Since:
        3.16-beta3
      • setIncludeHeadersFooters

        public void setIncludeHeadersFooters​(boolean includeHeadersFooters)
        Should headers and footers be included? Default is true
        Specified by:
        setIncludeHeadersFooters in interface ExcelExtractor
        Parameters:
        includeHeadersFooters - true if headers and footers should be included
      • getIncludeHeadersFooters

        public boolean getIncludeHeadersFooters()
        Returns:
        whether or not to include headers and footers
        Since:
        3.16-beta3
      • setIncludeTextBoxes

        public void setIncludeTextBoxes​(boolean includeTextBoxes)
        Should text from textboxes be included? Default is true
      • getIncludeTextBoxes

        public boolean getIncludeTextBoxes()
        Returns:
        whether or not to extract textboxes
        Since:
        3.16-beta3
      • setIncludeCellComments

        public void setIncludeCellComments​(boolean includeCellComments)
        Should cell comments be included? Default is false
        Specified by:
        setIncludeCellComments in interface ExcelExtractor
        Parameters:
        includeCellComments - true if cell comments should be included
      • getIncludeCellComments

        public boolean getIncludeCellComments()
        Returns:
        whether cell comments should be included
        Since:
        3.16-beta3
      • setConcatenatePhoneticRuns

        public void setConcatenatePhoneticRuns​(boolean concatenatePhoneticRuns)
        Concatenate text from <rPh> text elements in SharedStringsTable Default is true;
        Parameters:
        concatenatePhoneticRuns - true if runs should be concatenated, false otherwise
      • setLocale

        public void setLocale​(java.util.Locale locale)
      • getLocale

        public java.util.Locale getLocale()
        Returns:
        locale
        Since:
        3.16-beta3
      • getPackage

        public OPCPackage getPackage()
        Returns the opened OPCPackage container.
        Overrides:
        getPackage in class org.apache.poi.ooxml.extractor.POIXMLTextExtractor
      • getCoreProperties

        public org.apache.poi.ooxml.POIXMLProperties.CoreProperties getCoreProperties()
        Returns the core document properties
        Overrides:
        getCoreProperties in class org.apache.poi.ooxml.extractor.POIXMLTextExtractor
      • getExtendedProperties

        public org.apache.poi.ooxml.POIXMLProperties.ExtendedProperties getExtendedProperties()
        Returns the extended document properties
        Overrides:
        getExtendedProperties in class org.apache.poi.ooxml.extractor.POIXMLTextExtractor
      • getCustomProperties

        public org.apache.poi.ooxml.POIXMLProperties.CustomProperties getCustomProperties()
        Returns the custom document properties
        Overrides:
        getCustomProperties in class org.apache.poi.ooxml.extractor.POIXMLTextExtractor
      • processSheet

        public void processSheet​(XSSFSheetXMLHandler.SheetContentsHandler sheetContentsExtractor,
                                 Styles styles,
                                 Comments comments,
                                 SharedStrings strings,
                                 java.io.InputStream sheetInputStream)
                          throws java.io.IOException,
                                 org.xml.sax.SAXException
        Processes the given sheet
        Throws:
        java.io.IOException
        org.xml.sax.SAXException
      • getText

        public java.lang.String getText()
        Processes the file and returns the text
        Specified by:
        getText in interface ExcelExtractor
        Specified by:
        getText in class POITextExtractor
        Returns:
        All the text from the document
      • close

        public void close()
                   throws java.io.IOException
        Description copied from class: POITextExtractor
        Allows to free resources of the Extractor as soon as it is not needed any more. This may include closing open file handles and freeing memory. The Extractor cannot be used after close has been called.
        Specified by:
        close in interface java.lang.AutoCloseable
        Specified by:
        close in interface java.io.Closeable
        Overrides:
        close in class org.apache.poi.ooxml.extractor.POIXMLTextExtractor
        Throws:
        java.io.IOException