All Classes Interface Summary Class Summary Enum Summary Exception Summary
| Class |
Description |
| AbstractChunking |
This class specifies the base class for file chunking
|
| AbstractListManager |
|
| AbstractListManager.LevelTuple |
|
| AbstractListManager.ParagraphLevelCounter |
|
| AbstractOfficeParser |
|
| AbstractOOXMLExtractor |
Base class for all Tika OOXML extractors.
|
| AbstractXML2003Parser |
|
| ActiveMimeParser |
ActiveMime is a macro container format used in some mso files.
|
| AdapterHelper |
|
| AlternativePackaging |
|
| ArrayNumber |
The class is used to represent the number of the array.
|
| BasicObject |
Base object for FSSHTTPB.
|
| BinaryItem |
|
| Bit |
The class is used to read/set bit value for a byte array
|
| BitConverter |
|
| BitReader |
A class is used to extract values across byte boundaries with arbitrary bit positions.
|
| BitWriter |
|
| ByteUtil |
|
| Cell |
Cell of content.
|
| CellDecorator |
Cell decorator.
|
| CellID |
|
| CellIDArray |
|
| CellManifestCurrentRevision |
|
| CellManifestDataElementData |
Cell manifest data element
|
| ChmAccessor<T> |
Defines an accessor interface
|
| ChmAssert |
Contains chm extractor assertions
|
| ChmBlockInfo |
A container that contains chm block information such as: i. initial block is
using to reset main tree ii. start block is using for knowing where to start
iii. end block is using for knowing where to stop iv. start offset is using
for knowing where to start reading v. end offset is using for knowing where
to stop reading
|
| ChmCommons |
|
| ChmCommons.EntryType |
Represents entry types: uncompressed, compressed
|
| ChmCommons.IntelState |
Represents intel file states during decompression
|
| ChmCommons.LzxState |
Represents lzx states: started decoding, not started decoding
|
| ChmConstants |
|
| ChmDirectoryListingSet |
Holds chm listing entries
|
| ChmExtractor |
Extracts text from chm file.
|
| ChmItsfHeader |
The Header 0000: char[4] 'ITSF' 0004: DWORD 3 (Version number) 0008: DWORD
Total header length, including header section table and following data. 000C:
DWORD 1 (unknown) 0010: DWORD a timestamp 0014: DWORD Windows Language ID
0018: GUID {7C01FD10-7BAA-11D0-9E0C-00A0-C922-E6EC} 0028: GUID
{7C01FD11-7BAA-11D0-9E0C-00A0-C922-E6EC} Note: a GUID is $10 bytes, arranged
as 1 DWORD, 2 WORDs, and 8 BYTEs. 0000: QWORD Offset of section from
beginning of file 0008: QWORD Length of section Following the header section
table is 8 bytes of additional header data.
|
| ChmItspHeader |
Directory header The directory starts with a header; its format is as
follows: 0000: char[4] 'ITSP' 0004: DWORD Version number 1 0008: DWORD Length
of the directory header 000C: DWORD $0a (unknown) 0010: DWORD $1000 Directory
chunk size 0014: DWORD "Density" of quickref section, usually 2 0018: DWORD
Depth of the index tree - 1 there is no index, 2 if there is one level of
PMGI chunks 001C: DWORD Chunk number of root index chunk, -1 if there is none
(though at least one file has 0 despite there being no index chunk, probably
a bug) 0020: DWORD Chunk number of first PMGL (listing) chunk 0024: DWORD
Chunk number of last PMGL (listing) chunk 0028: DWORD -1 (unknown) 002C:
DWORD Number of directory chunks (total) 0030: DWORD Windows language ID
0034: GUID {5D02926A-212E-11D0-9DF9-00A0C922E6EC} 0044: DWORD $54 (This is
the length again) 0048: DWORD -1 (unknown) 004C: DWORD -1 (unknown) 0050:
DWORD -1 (unknown)
|
| ChmLzxBlock |
Decompresses a chm block.
|
| ChmLzxcControlData |
::DataSpace/Storage//ControlData This file contains $20 bytes of
information on the compression.
|
| ChmLzxcResetTable |
LZXC reset table For ensuring a decompression.
|
| ChmLzxState |
|
| ChmParser |
|
| ChmParsingException |
|
| ChmPmgiHeader |
Description Note: not always exists An index chunk has the following format:
0000: char[4] 'PMGI' 0004: DWORD Length of quickref/free area at end of
directory chunk 0008: Directory index entries (to quickref/free area) The
quickref area in an PMGI is the same as in an PMGL The format of a directory
index entry is as follows: BYTE: length of name BYTEs: name (UTF-8 encoded)
ENCINT: directory listing chunk which starts with name Encoded Integers aka
ENCINT An ENCINT is a variable-length integer.
|
| ChmPmglHeader |
Description There are two types of directory chunks -- index chunks, and
listing chunks.
|
| ChmSection |
|
| ChmWrapper |
|
| ChunkingFactory |
This class is used to create instance of AbstractChunking.
|
| ChunkingMethod |
|
| CommentPersonHandler |
|
| Compact64bitInt |
A 9-byte encoding of values in the range 0x0002000000000000 through 0xFFFFFFFFFFFFFFFF
|
| CompactID |
This class is used to represent the CompactID structrue.
|
| DataElement |
|
| DataElementData |
Base class of data element
|
| DataElementHash |
Specifies an data element hash stream object
|
| DataElementPackage |
|
| DataElementParseErrorException |
|
| DataElementType |
The enumeration of the data element type
|
| DataElementUtils |
|
| DataHashObject |
|
| DataNodeObjectData |
Data Node Object data
|
| DataSizeObject |
Data Size Object
|
| DirectoryListingEntry |
The format of a directory listing entry is as follows: BYTE: length of name
BYTEs: name (UTF-8 encoded) ENCINT: content section ENCINT: offset ENCINT:
length The offset is from the beginning of the content section the file is
in, after the section has been decompressed (if appropriate).
|
| EightBytesOfData |
This class is used to represent the property contains 8 bytes of data in the PropertySet.rgData stream field.
|
| EmailVisitor |
|
| EmbeddedPartMetadata |
This class records metadata about embedded parts that exists in the xml
of the main document.
|
| EMFParser |
Extracts files embedded in EMF and offers a
very rough capability to extract text if there
is text stored in the EMF.
|
| Error |
|
| ExcelExtractor |
Excel parser implementation which uses POI's Event API
to handle the contents of a Workbook.
|
| ExGuid |
|
| ExGUIDArray |
|
| ExtendedGUID |
|
| ExtendedMetadataExtractor |
This class extracts mapi properties as defined in the props_table.txt, which was generated from MS-OXPROPS.
|
| FormattingUtils |
|
| FormattingUtils.Tag |
|
| FourBytesOfData |
This class is used to represent the property contains 4 bytes of data in the PropertySet.rgData stream field.
|
| GlobalIdTableEntry3FNDX |
|
| GlobalIdTableEntryFNDX |
|
| GUID |
|
| GuidUtil |
|
| HeaderCell |
|
| HSLFExtractor |
|
| IFSSHTTPBSerializable |
FSSHTTPB Serialize interface.
|
| IntermediateNodeObject |
|
| IntermediateNodeObject.RootNodeObjectBuilder |
The class is used to build a root node object.
|
| IProperty |
The interface of the property in OneNote file.
|
| JackcessParser |
Parser that handles Microsoft Access files via
Jackcess
|
| JCID |
This class is used to represent a JCID
|
| JCIDObject |
This class is used to represent the JCID object.
|
| LeafNodeObject |
|
| LeafNodeObject.IntermediateNodeObjectBuilder |
The class is used to build a intermediate node object.
|
| LibPstParser |
This is an optional PST parser that relies on the user installing
the GPL-3 libpst/readpst commandline tool and configuring
Tika to call this library via tika-config.xml
|
| LibPstParserConfig |
|
| LinkedCell |
Linked cell.
|
| ListDescriptor |
Contains the information for a single list in the list or list override tables.
|
| ListManager |
Computes the number text which goes at the beginning of each list paragraph
|
| LittleEndianBitConverter |
Implement a converter which converts to/from little-endian byte arrays
|
| MAPITag |
|
| MetadataExtractor |
OOXML metadata extractor.
|
| MSEmbeddedStreamTranslator |
|
| MSOneStorePackage |
|
| MSOneStoreParser |
|
| MSOwnerFileParser |
Parser for temporary MSOFfice files.
|
| NoData |
This class is used to represent the property contains no data.
|
| NodeObject |
|
| NumberCell |
Number cell.
|
| ObjectGroupData |
The ObjectGroupData class.
|
| ObjectGroupDataElementData |
|
| ObjectGroupDataElementData.Builder |
The internal class for build a list of DataElement from a node object.
|
| ObjectGroupDeclarations |
Object Group Declarations
|
| ObjectGroupMetadata |
Specifies an object group metadata
|
| ObjectGroupMetadataDeclarations |
Object Metadata Declaration
|
| ObjectGroupObjectBLOBDataDeclaration |
object data BLOB declaration
|
| ObjectGroupObjectData |
|
| ObjectGroupObjectDataBLOBReference |
object data BLOB reference
|
| ObjectGroupObjectDeclare |
|
| ObjectSpaceObjectPropSet |
This class is used to represent a ObjectSpaceObjectPropSet.
|
| ObjectSpaceObjectPropSet |
|
| ObjectSpaceObjectStreamHeader |
|
| ObjectSpaceObjectStreamOfContextIDs |
This class is used to represent a ObjectSpaceObjectStreamOfContextIDs.
|
| ObjectSpaceObjectStreamOfOIDs |
This class is used to represent a ObjectSpaceObjectStreamOfOIDs.
|
| ObjectSpaceObjectStreamOfOSIDs |
This class is used to represent a ObjectSpaceObjectStreamOfOSIDs.
|
| OfficeParser |
Defines a Microsoft document content extractor.
|
| OfficeParser.POIFSDocumentType |
|
| OfficeParserConfig |
|
| OldExcelParser |
A POI-powered Tika Parser for very old versions of Excel, from
pre-OLE2 days, such as Excel 4.
|
| OneByteOfData |
This class is used to represent the property contains 1 byte of data in the PropertySet.rgData stream field.
|
| OneNoteParser |
OneNote tika parser capable of parsing Microsoft OneNote files.
|
| OneNotePropertyEnum |
|
| OneNoteTreeWalkerOptions |
Options when walking the one note tree.
|
| OOXMLExtractor |
Interface implemented by all Tika OOXML extractors.
|
| OOXMLExtractorFactory |
Figures out the correct OOXMLExtractor for the supplied document and
returns it.
|
| OOXMLParser |
Office Open XML (OOXML) parser.
|
| OOXMLTikaBodyPartHandler |
|
| OOXMLWordAndPowerPointTextHandler |
This class is intended to handle anything that might contain IBodyElements:
main document, headers, footers, notes, slides, etc.
|
| OOXMLWordAndPowerPointTextHandler.EditType |
|
| OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler |
|
| OPCPackageDetector |
|
| OPCPackageWrapper |
This is a wrapper around OPCPackage that calls revert() instead of close().
|
| OutlookExtractor |
Outlook Message Parser.
|
| OutlookExtractor.BODY_TYPES_PROCESSED |
|
| OutlookExtractor.RECIPIENT_TYPE |
|
| OutlookPSTParser |
Parser for MS Outlook PST email storage files
|
| ParagraphProperties |
|
| POIFSContainerDetector |
A detector that works on a POIFS OLE2 document
to figure out exactly what the file is.
|
| POIXMLTextExtractorDecorator |
|
| PropertyID |
This class is used to represent a PropertyID.
|
| PropertySet |
This class is used to represent a PropertySet.
|
| PropertySetObject |
This class is used to represent the property set.
|
| PropertyType |
|
| PrtArrayOfPropertyValues |
The class is used to represent the prtArrayOfPropertyValues .
|
| PrtFourBytesOfLengthFollowedByData |
This class is used to represent the prtFourBytesOfLengthFollowedByData.
|
| PSTMailItemParser |
|
| RDCAnalysisChunking |
This class is used to process RDC analysis chunking
|
| RequestTypes |
The enumeration of request type.
|
| RevisionManifest |
|
| RevisionManifestDataElementData |
|
| RevisionManifestObjectGroupReferences |
Specifies a revision manifest object group references, each followed by object group extended GUIDs
|
| RevisionManifestRootDeclare |
Specifies a revision manifest root declare, each followed by root and object extended GUIDs
|
| RevisionStoreObject |
The class is used to represent the revision store object.
|
| RevisionStoreObjectGroup |
|
| RTFParser |
RTF parser
|
| RunProperties |
WARNING: This class is mutable.
|
| SequenceNumberGenerator |
|
| SerialNumber |
|
| SignatureObject |
Signature Object
|
| SimpleChunking |
|
| SpreadsheetMLParser |
Parses wordml 2003 format Excel files.
|
| StorageIndexCellMapping |
Specifies the storage index cell mappings (with cell identifier, cell mapping extended GUID,
and cell mapping serial number)
|
| StorageIndexDataElementData |
|
| StorageIndexManifestMapping |
|
| StorageIndexRevisionMapping |
Specifies the storage index revision mappings (with revision and revision mapping
extended GUIDs, and revision mapping serial number)
|
| StorageManifestDataElementData |
|
| StorageManifestRootDeclare |
Specifies one or more storage manifest root declare.
|
| StorageManifestSchemaGUID |
Specifies a storage manifest schema GUID
|
| StreamObject |
|
| StreamObjectHeaderEnd |
|
| StreamObjectHeaderEnd16bit |
An 16-bit header for a compound object would indicate the end of a stream object
|
| StreamObjectHeaderEnd8bit |
An 8-bit header for a compound object would indicate the end of a stream object
|
| StreamObjectHeaderStart |
This class specifies the base class for 16-bit or 32-bit stream object header start
|
| StreamObjectHeaderStart16bit |
An 16-bit header for a compound object would indicate the start of a stream object
|
| StreamObjectHeaderStart32bit |
An 32-bit header for a compound object would indicate the start of a stream object
|
| StreamObjectParseErrorException |
|
| StreamObjectTypeHeaderEnd |
|
| StreamObjectTypeHeaderStart |
The enumeration of the stream object type header start
|
| SummaryExtractor |
Extractor for Common OLE2 (HPSF) metadata
|
| SXSLFPowerPointExtractorDecorator |
SAX/Streaming pptx extractior
|
| SXWPFWordExtractorDecorator |
This is an experimental, alternative extractor for docx files.
|
| TextCell |
Text cell.
|
| TikaExcelDataFormatter |
Overrides Excel's General format to include more
significant digits than the MS Spec allows.
|
| TikaExcelGeneralFormat |
A Format that allows up to 15 significant digits for integers.
|
| TikaNameIdChunks |
Collection of convenience chunks for the NameID part of an outlook file
|
| TikaNameIdChunks.PredefinedPropertySet |
|
| TikaNameIdChunks.PropertySetType |
|
| TNEFParser |
A POI-powered Tika Parser for TNEF (Transport Neutral
Encoding Format) messages, aka winmail.dat
|
| TwoBytesOfData |
This class is used to represent the property contains 2 bytes of data in the PropertySet.rgData stream field.
|
| UByte |
The unsigned byte type
|
| UInteger |
The unsigned int type
|
| ULong |
The unsigned long type
|
| UMath |
|
| Unsigned |
A utility class for static access to unsigned number functionality.
|
| UNumber |
A base type for unsigned numbers.
|
| UShort |
The unsigned short type
|
| UuidUtils |
|
| WMFParser |
This parser offers a very rough capability to extract text if there
is text stored in the WMF files.
|
| Word2006MLParser |
|
| WordExtractor |
|
| WordExtractor.TagAndStyle |
|
| WordMLParser |
Parses wordml 2003 format word files.
|
| XPSExtractorDecorator |
|
| XPSTextExtractor |
Currently, mostly a pass-through class to hold pkg and properties
and keep the general framework similar to our other POI-integrated
extractors.
|
| XSLFEventBasedPowerPointExtractor |
|
| XSLFPowerPointExtractorDecorator |
|
| XSSFBExcelExtractorDecorator |
|
| XSSFExcelExtractorDecorator |
|
| XSSFExcelExtractorDecorator.HeaderFooterFromString |
|
| XSSFExcelExtractorDecorator.SheetTextAsHTML |
Turns formatted sheet events into HTML
|
| XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer |
Captures information on interesting tags, whilst
delegating the main work to the formatting handler
|
| XWPFEventBasedWordExtractor |
Experimental class that is based on POI's XSSFEventBasedExcelExtractor
|
| XWPFListManager |
|
| XWPFNumberingShim |
Stub class of POI's XWPFNumbering because onDocumentRead() is protected
|
| XWPFStylesShim |
For Tika, all we need (so far) is a mapping between styleId and a style's name.
|
| XWPFWordExtractorDecorator |
|
| ZipFilesChunking |
This class is used to process zip file chunking
|
| ZipHeader |
|