Class POIXMLExtractorFactory

  • All Implemented Interfaces:
    ExtractorProvider

    public final class POIXMLExtractorFactory
    extends java.lang.Object
    implements ExtractorProvider
    Figures out the correct POITextExtractor for your supplied document, and returns it.

    Note 1 - will fail for many file formats if the POI Scratchpad jar is not present on the runtime classpath

    Note 2 - rather than using this, for most cases you would be better off switching to Apache Tika instead!

    • Constructor Detail

      • POIXMLExtractorFactory

        public POIXMLExtractorFactory()
    • Method Detail

      • getThreadPrefersEventExtractors

        public static boolean getThreadPrefersEventExtractors()
        Should this thread prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is false.
      • getAllThreadsPreferEventExtractors

        public static java.lang.Boolean getAllThreadsPreferEventExtractors()
        Should all threads prefer event based over usermodel based extractors? (usermodel extractors tend to be more accurate, but use more memory) Default is to use the thread level setting, which defaults to false.
      • setThreadPrefersEventExtractors

        public static void setThreadPrefersEventExtractors​(boolean preferEventExtractors)
        Should this thread prefer event based over usermodel based extractors? Will only be used if the All Threads setting is null.
      • setAllThreadsPreferEventExtractors

        public static void setAllThreadsPreferEventExtractors​(java.lang.Boolean preferEventExtractors)
        Should all threads prefer event based over usermodel based extractors? If set, will take preference over the Thread level setting.
      • getPreferEventExtractor

        public static boolean getPreferEventExtractor()
        Should this thread use event based extractors is available? Checks the all-threads one first, then thread specific.
      • create

        public POITextExtractor create​(java.io.File f,
                                       java.lang.String password)
                                throws java.io.IOException
        Description copied from interface: ExtractorProvider
        Create Extractor via file
        Specified by:
        create in interface ExtractorProvider
        Parameters:
        f - the file
        password - the password or null if not encrypted
        Returns:
        the extractor
        Throws:
        java.io.IOException - if file can't be read or parsed
      • create

        public POITextExtractor create​(java.io.InputStream inp,
                                       java.lang.String password)
                                throws java.io.IOException
        Description copied from interface: ExtractorProvider
        Create Extractor via InputStream
        Specified by:
        create in interface ExtractorProvider
        Parameters:
        inp - the stream
        password - the password or null if not encrypted
        Returns:
        the extractor
        Throws:
        java.io.IOException - if stream can't be read or parsed
      • create

        public POIXMLTextExtractor create​(OPCPackage pkg)
                                   throws java.io.IOException
        Tries to determine the actual type of file and produces a matching text-extractor for it.
        Parameters:
        pkg - An OPCPackage.
        Returns:
        A POIXMLTextExtractor for the given file.
        Throws:
        java.io.IOException - If an error occurs while reading the file
        java.lang.IllegalArgumentException - If no matching file type could be found.
      • create

        public POITextExtractor create​(DirectoryNode poifsDir,
                                       java.lang.String password)
                                throws java.io.IOException
        Description copied from interface: ExtractorProvider
        Create Extractor from POIFS node
        Specified by:
        create in interface ExtractorProvider
        Parameters:
        poifsDir - the node
        password - the password or null if not encrypted
        Returns:
        the extractor
        Throws:
        java.io.IOException - if node can't be parsed