Package org.apache.tika.detect.microsoft
Class POIFSContainerDetector
java.lang.Object
org.apache.tika.detect.microsoft.POIFSContainerDetector
- All Implemented Interfaces:
Serializable,org.apache.tika.detect.Detector
A detector that works on a POIFS OLE2 document
to figure out exactly what the file is.
This should work for all OLE2 documents, whether
they are ones supported by POI or not.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final org.apache.tika.mime.MediaTypeSome other kind of embedded document, in a CompObj container within another OLE2 documentstatic final org.apache.tika.mime.MediaTypestatic final org.apache.tika.mime.MediaTypeMicrosoft Wordstatic final org.apache.tika.mime.MediaTypeTIKA-3666 MSOffice or other file encrypted with DRM in an OLE containerstatic final org.apache.tika.mime.MediaTypestatic final org.apache.tika.mime.MediaTypeGeneral embedded document type within an OLE2 containerstatic final org.apache.tika.mime.MediaTypeMicrosoft Projectstatic final org.apache.tika.mime.MediaTypeEquation embedded in Office docsstatic final org.apache.tika.mime.MediaTypeGraph/Charts embedded in PowerPoint and Excelstatic final org.apache.tika.mime.MediaTypeMicrosoft Outlookstatic final Stringstatic final org.apache.tika.mime.MediaTypeThe OLE base file formatstatic final org.apache.tika.mime.MediaTypeAn OLE10 Native embedded document within another OLE2 documentstatic final org.apache.tika.mime.MediaTypeThe protected OOXML base file formatstatic final org.apache.tika.mime.MediaTypeMicrosoft PowerPointstatic final org.apache.tika.mime.MediaTypeMicrosoft Publisherstatic final org.apache.tika.mime.MediaTypeStarOffice Drawstatic final org.apache.tika.mime.MediaTypeStarOffice Calcstatic final org.apache.tika.mime.MediaTypeStarOffice Impressstatic final org.apache.tika.mime.MediaTypeStarOffice Writerstatic final org.apache.tika.mime.MediaTypeSolidWorks CAD filestatic final org.apache.tika.mime.MediaTypeMicrosoft Visiostatic final org.apache.tika.mime.MediaTypeMicrosoft Worksstatic final org.apache.tika.mime.MediaTypeMicrosoft Works Spreadsheet 7.0static final org.apache.tika.mime.MediaTypeMicrosoft Excel -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionorg.apache.tika.mime.MediaTypedetect(InputStream input, org.apache.tika.metadata.Metadata metadata) static org.apache.tika.mime.MediaTypeInternal detection of the specific kind of OLE2 document, based on the names of the top-level streams within the file.voidsetMarkLimit(int markLimit) If a TikaInputStream is passed in todetect(InputStream, Metadata), and there is not an underlying file, this detector will spool up tomarkLimitto disk.
-
Field Details
-
OLE
public static final org.apache.tika.mime.MediaType OLEThe OLE base file format -
OOXML_PROTECTED
public static final org.apache.tika.mime.MediaType OOXML_PROTECTEDThe protected OOXML base file format -
DRM_ENCRYPTED
public static final org.apache.tika.mime.MediaType DRM_ENCRYPTEDTIKA-3666 MSOffice or other file encrypted with DRM in an OLE container -
GENERAL_EMBEDDED
public static final org.apache.tika.mime.MediaType GENERAL_EMBEDDEDGeneral embedded document type within an OLE2 container -
OLE10_NATIVE
public static final org.apache.tika.mime.MediaType OLE10_NATIVEAn OLE10 Native embedded document within another OLE2 document -
COMP_OBJ
public static final org.apache.tika.mime.MediaType COMP_OBJSome other kind of embedded document, in a CompObj container within another OLE2 document -
MS_GRAPH_CHART
public static final org.apache.tika.mime.MediaType MS_GRAPH_CHARTGraph/Charts embedded in PowerPoint and Excel -
MS_EQUATION
public static final org.apache.tika.mime.MediaType MS_EQUATIONEquation embedded in Office docs -
OCX_NAME
- See Also:
-
XLS
public static final org.apache.tika.mime.MediaType XLSMicrosoft Excel -
DOC
public static final org.apache.tika.mime.MediaType DOCMicrosoft Word -
PPT
public static final org.apache.tika.mime.MediaType PPTMicrosoft PowerPoint -
PUB
public static final org.apache.tika.mime.MediaType PUBMicrosoft Publisher -
VSD
public static final org.apache.tika.mime.MediaType VSDMicrosoft Visio -
WPS
public static final org.apache.tika.mime.MediaType WPSMicrosoft Works -
XLR
public static final org.apache.tika.mime.MediaType XLRMicrosoft Works Spreadsheet 7.0 -
MSG
public static final org.apache.tika.mime.MediaType MSGMicrosoft Outlook -
MPP
public static final org.apache.tika.mime.MediaType MPPMicrosoft Project -
SDC
public static final org.apache.tika.mime.MediaType SDCStarOffice Calc -
SDA
public static final org.apache.tika.mime.MediaType SDAStarOffice Draw -
SDD
public static final org.apache.tika.mime.MediaType SDDStarOffice Impress -
SDW
public static final org.apache.tika.mime.MediaType SDWStarOffice Writer -
SLDWORKS
public static final org.apache.tika.mime.MediaType SLDWORKSSolidWorks CAD file -
ESRI_LAYER
public static final org.apache.tika.mime.MediaType ESRI_LAYER -
DGN_8
public static final org.apache.tika.mime.MediaType DGN_8
-
-
Constructor Details
-
POIFSContainerDetector
public POIFSContainerDetector()
-
-
Method Details
-
detect
public static org.apache.tika.mime.MediaType detect(Set<String> anyCaseNames, org.apache.poi.poifs.filesystem.DirectoryEntry root) Internal detection of the specific kind of OLE2 document, based on the names of the top-level streams within the file. In some cases the detection may need access to the rootDirectoryEntryof that file for best results. The entry can be given as a second, optional argument. Following 2.6.1 of MS-CFB , The detection is performed on case insensitive entry names.- Parameters:
anyCaseNames-root-- Returns:
-
setMarkLimit
public void setMarkLimit(int markLimit) If a TikaInputStream is passed in todetect(InputStream, Metadata), and there is not an underlying file, this detector will spool up tomarkLimitto disk. If the stream was read in entirety (e.g. the spooled file is not truncated), this detector will open the file with POI and perform detection. If the spooled file is truncated, the detector will returnOLE(orMediaType.OCTET_STREAMif there's no OLE header).As of Tika 1.21, this detector respects the legacy behavior of not performing detection on a non-TikaInputStream.
- Parameters:
markLimit-
-
detect
public org.apache.tika.mime.MediaType detect(InputStream input, org.apache.tika.metadata.Metadata metadata) throws IOException - Specified by:
detectin interfaceorg.apache.tika.detect.Detector- Throws:
IOException
-