Package org.apache.poi.hssf.extractor
Class OldExcelExtractor
- java.lang.Object
-
- org.apache.poi.hssf.extractor.OldExcelExtractor
-
- All Implemented Interfaces:
java.io.Closeable
,java.lang.AutoCloseable
,POITextExtractor
public class OldExcelExtractor extends java.lang.Object implements POITextExtractor
A text extractor for old Excel files, which are too old for HSSFWorkbook to handle. This includes Excel 95, and very old (pre-OLE2) Excel files, such as Excel 4 files.Returns much (but not all) of the textual content of the file, suitable for indexing by something like Apache Lucene, or used by Apache Tika, but not really intended for display to the user.
-
-
Constructor Summary
Constructors Constructor Description OldExcelExtractor(java.io.File f)
OldExcelExtractor(java.io.InputStream input)
OldExcelExtractor(DirectoryNode directory)
OldExcelExtractor(POIFSFileSystem fs)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description int
getBiffVersion()
The Biff version, largely corresponding to the Excel versionjava.lang.Object
getDocument()
java.io.Closeable
getFilesystem()
int
getFileType()
The kind of the file, one ofBOFRecord.TYPE_WORKSHEET
,BOFRecord.TYPE_CHART
,BOFRecord.TYPE_EXCEL_4_MACRO
orBOFRecord.TYPE_WORKSPACE_FILE
POITextExtractor
getMetadataTextExtractor()
Returns another text extractor, which is able to output the textual content of the document metadata / properties, such as author and title.java.lang.String
getText()
Retrieves the text contents of the file, as best we can for these old file formatsboolean
isCloseFilesystem()
static void
main(java.lang.String[] args)
void
setCloseFilesystem(boolean doCloseFilesystem)
-
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface org.apache.poi.extractor.POITextExtractor
close
-
-
-
-
Constructor Detail
-
OldExcelExtractor
public OldExcelExtractor(java.io.InputStream input) throws java.io.IOException
- Throws:
java.io.IOException
-
OldExcelExtractor
public OldExcelExtractor(java.io.File f) throws java.io.IOException
- Throws:
java.io.IOException
-
OldExcelExtractor
public OldExcelExtractor(POIFSFileSystem fs) throws java.io.IOException
- Throws:
java.io.IOException
-
OldExcelExtractor
public OldExcelExtractor(DirectoryNode directory) throws java.io.IOException
- Throws:
java.io.IOException
-
-
Method Detail
-
main
public static void main(java.lang.String[] args) throws java.io.IOException
- Throws:
java.io.IOException
-
getBiffVersion
public int getBiffVersion()
The Biff version, largely corresponding to the Excel version- Returns:
- the Biff version
-
getFileType
public int getFileType()
The kind of the file, one ofBOFRecord.TYPE_WORKSHEET
,BOFRecord.TYPE_CHART
,BOFRecord.TYPE_EXCEL_4_MACRO
orBOFRecord.TYPE_WORKSPACE_FILE
- Returns:
- the file type
-
getText
public java.lang.String getText()
Retrieves the text contents of the file, as best we can for these old file formats- Specified by:
getText
in interfacePOITextExtractor
- Returns:
- the text contents of the file
-
getMetadataTextExtractor
public POITextExtractor getMetadataTextExtractor()
Description copied from interface:POITextExtractor
Returns another text extractor, which is able to output the textual content of the document metadata / properties, such as author and title.- Specified by:
getMetadataTextExtractor
in interfacePOITextExtractor
- Returns:
- the metadata and text extractor
-
setCloseFilesystem
public void setCloseFilesystem(boolean doCloseFilesystem)
- Specified by:
setCloseFilesystem
in interfacePOITextExtractor
- Parameters:
doCloseFilesystem
-true
(default), if underlying resources/filesystem should be closed onPOITextExtractor.close()
-
isCloseFilesystem
public boolean isCloseFilesystem()
- Specified by:
isCloseFilesystem
in interfacePOITextExtractor
- Returns:
true
, if resources/filesystem should be closed onPOITextExtractor.close()
-
getFilesystem
public java.io.Closeable getFilesystem()
- Specified by:
getFilesystem
in interfacePOITextExtractor
- Returns:
- The underlying resources/filesystem
-
getDocument
public java.lang.Object getDocument()
- Specified by:
getDocument
in interfacePOITextExtractor
- Returns:
- the processed document
-
-