All Classes (Apache Tika Microsoft parser module 3.2.3 API)

All Classes Interface Summary Class Summary Enum Summary Exception Summary
Class	Description
AbstractChunking	This class specifies the base class for file chunking
AbstractListManager
AbstractListManager.LevelTuple
AbstractListManager.ParagraphLevelCounter
AbstractOfficeParser	Intermediate layer to set `OfficeParserConfig` uniformly.
AbstractOOXMLExtractor	Base class for all Tika OOXML extractors.
AbstractXML2003Parser
ActiveMimeParser	ActiveMime is a macro container format used in some mso files.
AdapterHelper
AlternativePackaging
ArrayNumber	The class is used to represent the number of the array.
BasicObject	Base object for FSSHTTPB.
BinaryItem
Bit	The class is used to read/set bit value for a byte array
BitConverter
BitReader	A class is used to extract values across byte boundaries with arbitrary bit positions.
BitWriter
ByteUtil
Cell	Cell of content.
CellDecorator	Cell decorator.
CellID
CellIDArray
CellManifestCurrentRevision
CellManifestDataElementData	Cell manifest data element
ChmAccessor<T>	Defines an accessor interface
ChmAssert	Contains chm extractor assertions
ChmBlockInfo	A container that contains chm block information such as: i. initial block is using to reset main tree ii. start block is using for knowing where to start iii. end block is using for knowing where to stop iv. start offset is using for knowing where to start reading v. end offset is using for knowing where to stop reading
ChmCommons
ChmCommons.EntryType	Represents entry types: uncompressed, compressed
ChmCommons.IntelState	Represents intel file states during decompression
ChmCommons.LzxState	Represents lzx states: started decoding, not started decoding
ChmConstants
ChmDirectoryListingSet	Holds chm listing entries
ChmExtractor	Extracts text from chm file.
ChmItsfHeader	The Header 0000: char[4] 'ITSF' 0004: DWORD 3 (Version number) 0008: DWORD Total header length, including header section table and following data. 000C: DWORD 1 (unknown) 0010: DWORD a timestamp 0014: DWORD Windows Language ID 0018: GUID {7C01FD10-7BAA-11D0-9E0C-00A0-C922-E6EC} 0028: GUID {7C01FD11-7BAA-11D0-9E0C-00A0-C922-E6EC} Note: a GUID is $10 bytes, arranged as 1 DWORD, 2 WORDs, and 8 BYTEs. 0000: QWORD Offset of section from beginning of file 0008: QWORD Length of section Following the header section table is 8 bytes of additional header data.
ChmItspHeader	Directory header The directory starts with a header; its format is as follows: 0000: char[4] 'ITSP' 0004: DWORD Version number 1 0008: DWORD Length of the directory header 000C: DWORD $0a (unknown) 0010: DWORD $1000 Directory chunk size 0014: DWORD "Density" of quickref section, usually 2 0018: DWORD Depth of the index tree - 1 there is no index, 2 if there is one level of PMGI chunks 001C: DWORD Chunk number of root index chunk, -1 if there is none (though at least one file has 0 despite there being no index chunk, probably a bug) 0020: DWORD Chunk number of first PMGL (listing) chunk 0024: DWORD Chunk number of last PMGL (listing) chunk 0028: DWORD -1 (unknown) 002C: DWORD Number of directory chunks (total) 0030: DWORD Windows language ID 0034: GUID {5D02926A-212E-11D0-9DF9-00A0C922E6EC} 0044: DWORD $54 (This is the length again) 0048: DWORD -1 (unknown) 004C: DWORD -1 (unknown) 0050: DWORD -1 (unknown)
ChmLzxBlock	Decompresses a chm block.
ChmLzxcControlData	::DataSpace/Storage//ControlData This file contains $20 bytes of information on the compression.
ChmLzxcResetTable	LZXC reset table For ensuring a decompression.
ChmLzxState
ChmParser
ChmParsingException
ChmPmgiHeader	Description Note: not always exists An index chunk has the following format: 0000: char[4] 'PMGI' 0004: DWORD Length of quickref/free area at end of directory chunk 0008: Directory index entries (to quickref/free area) The quickref area in an PMGI is the same as in an PMGL The format of a directory index entry is as follows: BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: directory listing chunk which starts with name Encoded Integers aka ENCINT An ENCINT is a variable-length integer.
ChmPmglHeader	Description There are two types of directory chunks -- index chunks, and listing chunks.
ChmSection
ChmWrapper
ChunkingFactory	This class is used to create instance of AbstractChunking.
ChunkingMethod
CommentPersonHandler
Compact64bitInt	A 9-byte encoding of values in the range 0x0002000000000000 through 0xFFFFFFFFFFFFFFFF
CompactID	This class is used to represent the CompactID structrue.
DataElement
DataElementData	Base class of data element
DataElementHash	Specifies an data element hash stream object
DataElementPackage
DataElementParseErrorException
DataElementType	The enumeration of the data element type
DataElementUtils
DataHashObject
DataNodeObjectData	Data Node Object data
DataSizeObject	Data Size Object
DirectoryListingEntry	The format of a directory listing entry is as follows: BYTE: length of name BYTEs: name (UTF-8 encoded) ENCINT: content section ENCINT: offset ENCINT: length The offset is from the beginning of the content section the file is in, after the section has been decompressed (if appropriate).
EightBytesOfData	This class is used to represent the property contains 8 bytes of data in the PropertySet.rgData stream field.
EmailVisitor
EmbeddedPartMetadata	This class records metadata about embedded parts that exists in the xml of the main document.
EMFParser	Extracts files embedded in EMF and offers a very rough capability to extract text if there is text stored in the EMF.
Error
ExcelExtractor	Excel parser implementation which uses POI's Event API to handle the contents of a Workbook.
ExGuid
ExGUIDArray
ExtendedGUID
ExtendedMetadataExtractor	This class extracts mapi properties as defined in the props_table.txt, which was generated from MS-OXPROPS.
FormattingUtils
FormattingUtils.Tag
FourBytesOfData	This class is used to represent the property contains 4 bytes of data in the PropertySet.rgData stream field.
GlobalIdTableEntry3FNDX
GlobalIdTableEntryFNDX
GUID
GuidUtil
HeaderCell
HSLFExtractor
IFSSHTTPBSerializable	FSSHTTPB Serialize interface.
IntermediateNodeObject
IntermediateNodeObject.RootNodeObjectBuilder	The class is used to build a root node object.
IProperty	The interface of the property in OneNote file.
JackcessParser	Parser that handles Microsoft Access files via Jackcess
JCID	This class is used to represent a JCID
JCIDObject	This class is used to represent the JCID object.
LeafNodeObject
LeafNodeObject.IntermediateNodeObjectBuilder	The class is used to build a intermediate node object.
LibPstParser	This is an optional PST parser that relies on the user installing the GPL-3 libpst/readpst commandline tool and configuring Tika to call this library via tika-config.xml
LibPstParserConfig
LinkedCell	Linked cell.
ListDescriptor	Contains the information for a single list in the list or list override tables.
ListManager	Computes the number text which goes at the beginning of each list paragraph
LittleEndianBitConverter	Implement a converter which converts to/from little-endian byte arrays
MAPITag
MetadataExtractor	OOXML metadata extractor.
MSEmbeddedStreamTranslator
MSOneStorePackage
MSOneStoreParser
MSOwnerFileParser	Parser for temporary MSOFfice files.
NoData	This class is used to represent the property contains no data.
NodeObject
NumberCell	Number cell.
ObjectGroupData	The ObjectGroupData class.
ObjectGroupDataElementData
ObjectGroupDataElementData.Builder	The internal class for build a list of DataElement from a node object.
ObjectGroupDeclarations	Object Group Declarations
ObjectGroupMetadata	Specifies an object group metadata
ObjectGroupMetadataDeclarations	Object Metadata Declaration
ObjectGroupObjectBLOBDataDeclaration	object data BLOB declaration
ObjectGroupObjectData
ObjectGroupObjectDataBLOBReference	object data BLOB reference
ObjectGroupObjectDeclare
ObjectSpaceObjectPropSet	This class is used to represent a ObjectSpaceObjectPropSet.
ObjectSpaceObjectPropSet
ObjectSpaceObjectStreamHeader
ObjectSpaceObjectStreamOfContextIDs	This class is used to represent a ObjectSpaceObjectStreamOfContextIDs.
ObjectSpaceObjectStreamOfOIDs	This class is used to represent a ObjectSpaceObjectStreamOfOIDs.
ObjectSpaceObjectStreamOfOSIDs	This class is used to represent a ObjectSpaceObjectStreamOfOSIDs.
OfficeParser	Defines a Microsoft document content extractor.
OfficeParser.POIFSDocumentType
OfficeParserConfig
OldExcelParser	A POI-powered Tika Parser for very old versions of Excel, from pre-OLE2 days, such as Excel 4.
OneByteOfData	This class is used to represent the property contains 1 byte of data in the PropertySet.rgData stream field.
OneNoteParser	OneNote tika parser capable of parsing Microsoft OneNote files.
OneNotePropertyEnum
OneNoteTreeWalkerOptions	Options when walking the one note tree.
OOXMLExtractor	Interface implemented by all Tika OOXML extractors.
OOXMLExtractorFactory	Figures out the correct `OOXMLExtractor` for the supplied document and returns it.
OOXMLParser	Office Open XML (OOXML) parser.
OOXMLTikaBodyPartHandler
OOXMLWordAndPowerPointTextHandler	This class is intended to handle anything that might contain IBodyElements: main document, headers, footers, notes, slides, etc.
OOXMLWordAndPowerPointTextHandler.EditType
OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler
OPCPackageDetector
OPCPackageWrapper	This is a wrapper around OPCPackage that calls revert() instead of close().
OutlookExtractor	Outlook Message Parser.
OutlookExtractor.BODY_TYPES_PROCESSED
OutlookExtractor.RECIPIENT_TYPE
OutlookPSTParser	Parser for MS Outlook PST email storage files
ParagraphProperties
POIFSContainerDetector	A detector that works on a POIFS OLE2 document to figure out exactly what the file is.
POIXMLTextExtractorDecorator
PropertyID	This class is used to represent a PropertyID.
PropertySet	This class is used to represent a PropertySet.
PropertySetObject	This class is used to represent the property set.
PropertyType
PrtArrayOfPropertyValues	The class is used to represent the prtArrayOfPropertyValues .
PrtFourBytesOfLengthFollowedByData	This class is used to represent the prtFourBytesOfLengthFollowedByData.
PSTMailItemParser
RDCAnalysisChunking	This class is used to process RDC analysis chunking
RequestTypes	The enumeration of request type.
RevisionManifest
RevisionManifestDataElementData
RevisionManifestObjectGroupReferences	Specifies a revision manifest object group references, each followed by object group extended GUIDs
RevisionManifestRootDeclare	Specifies a revision manifest root declare, each followed by root and object extended GUIDs
RevisionStoreObject	The class is used to represent the revision store object.
RevisionStoreObjectGroup
RTFParser	RTF parser
RunProperties	WARNING: This class is mutable.
SequenceNumberGenerator
SerialNumber
SignatureObject	Signature Object
SimpleChunking
SpreadsheetMLParser	Parses wordml 2003 format Excel files.
StorageIndexCellMapping	Specifies the storage index cell mappings (with cell identifier, cell mapping extended GUID, and cell mapping serial number)
StorageIndexDataElementData
StorageIndexManifestMapping
StorageIndexRevisionMapping	Specifies the storage index revision mappings (with revision and revision mapping extended GUIDs, and revision mapping serial number)
StorageManifestDataElementData
StorageManifestRootDeclare	Specifies one or more storage manifest root declare.
StorageManifestSchemaGUID	Specifies a storage manifest schema GUID
StreamObject
StreamObjectHeaderEnd
StreamObjectHeaderEnd16bit	An 16-bit header for a compound object would indicate the end of a stream object
StreamObjectHeaderEnd8bit	An 8-bit header for a compound object would indicate the end of a stream object
StreamObjectHeaderStart	This class specifies the base class for 16-bit or 32-bit stream object header start
StreamObjectHeaderStart16bit	An 16-bit header for a compound object would indicate the start of a stream object
StreamObjectHeaderStart32bit	An 32-bit header for a compound object would indicate the start of a stream object
StreamObjectParseErrorException
StreamObjectTypeHeaderEnd
StreamObjectTypeHeaderStart	The enumeration of the stream object type header start
SummaryExtractor	Extractor for Common OLE2 (HPSF) metadata
SXSLFPowerPointExtractorDecorator	SAX/Streaming pptx extractior
SXWPFWordExtractorDecorator	This is an experimental, alternative extractor for docx files.
TextCell	Text cell.
TikaExcelDataFormatter	Overrides Excel's General format to include more significant digits than the MS Spec allows.
TikaExcelGeneralFormat	A Format that allows up to 15 significant digits for integers.
TikaNameIdChunks	Collection of convenience chunks for the NameID part of an outlook file
TikaNameIdChunks.PredefinedPropertySet
TikaNameIdChunks.PropertySetType
TNEFParser	A POI-powered Tika Parser for TNEF (Transport Neutral Encoding Format) messages, aka winmail.dat
TwoBytesOfData	This class is used to represent the property contains 2 bytes of data in the PropertySet.rgData stream field.
UByte	The `unsigned byte` type
UInteger	The `unsigned int` type
ULong	The `unsigned long` type
UMath
Unsigned	A utility class for static access to unsigned number functionality.
UNumber	A base type for unsigned numbers.
UShort	The `unsigned short` type
UuidUtils
WMFParser	This parser offers a very rough capability to extract text if there is text stored in the WMF files.
Word2006MLParser
WordExtractor
WordExtractor.TagAndStyle
WordMLParser	Parses wordml 2003 format word files.
XPSExtractorDecorator
XPSTextExtractor	Currently, mostly a pass-through class to hold pkg and properties and keep the general framework similar to our other POI-integrated extractors.
XSLFEventBasedPowerPointExtractor
XSLFPowerPointExtractorDecorator
XSSFBExcelExtractorDecorator
XSSFExcelExtractorDecorator
XSSFExcelExtractorDecorator.HeaderFooterFromString
XSSFExcelExtractorDecorator.SheetTextAsHTML	Turns formatted sheet events into HTML
XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer	Captures information on interesting tags, whilst delegating the main work to the formatting handler
XWPFEventBasedWordExtractor	Experimental class that is based on POI's XSSFEventBasedExcelExtractor
XWPFListManager
XWPFNumberingShim	Stub class of POI's XWPFNumbering because onDocumentRead() is protected
XWPFStylesShim	For Tika, all we need (so far) is a mapping between styleId and a style's name.
XWPFWordExtractorDecorator
ZipFilesChunking	This class is used to process zip file chunking
ZipHeader