All Classes Interface Summary Class Summary Enum Summary Exception Summary
| Class |
Description |
| AbstractChunking |
This class specifies the base class for file chunking
|
| AbstractListManager |
|
| AbstractListManager.LevelTuple |
|
| AbstractListManager.ParagraphLevelCounter |
|
| AbstractOfficeParser |
|
| AbstractOOXMLExtractor |
Base class for all Tika OOXML extractors.
|
| AbstractXML2003Parser |
|
| ActiveMimeParser |
ActiveMime is a macro container format used in some mso files.
|
| AdapterHelper |
|
| AlternativePackaging |
|
| ArrayNumber |
The class is used to represent the number of the array.
|
| BasicObject |
Base object for FSSHTTPB.
|
| BinaryItem |
|
| Bit |
The class is used to read/set bit value for a byte array
|
| BitConverter |
|
| BitReader |
A class is used to extract values across byte boundaries with arbitrary bit positions.
|
| BitWriter |
|
| ByteUtil |
|
| Cell |
Cell of content.
|
| CellDecorator |
Cell decorator.
|
| CellID |
|
| CellIDArray |
|
| CellManifestCurrentRevision |
|
| CellManifestDataElementData |
Cell manifest data element
|
| ChmAccessor<T> |
Defines an accessor interface
|
| ChmAssert |
Contains chm extractor assertions
|
| ChmBlockInfo |
A container that contains chm block information such as: i.
|
| ChmCommons |
|
| ChmCommons.EntryType |
Represents entry types: uncompressed, compressed
|
| ChmCommons.IntelState |
Represents intel file states during decompression
|
| ChmCommons.LzxState |
Represents lzx states: started decoding, not started decoding
|
| ChmConstants |
|
| ChmDirectoryListingSet |
Holds chm listing entries
|
| ChmExtractor |
Extracts text from chm file.
|
| ChmItsfHeader |
The Header 0000: char[4] 'ITSF' 0004: DWORD 3 (Version number) 0008: DWORD
Total header length, including header section table and following data.
|
| ChmItspHeader |
Directory header The directory starts with a header; its format is as
follows: 0000: char[4] 'ITSP' 0004: DWORD Version number 1 0008: DWORD Length
of the directory header 000C: DWORD $0a (unknown) 0010: DWORD $1000 Directory
chunk size 0014: DWORD "Density" of quickref section, usually 2 0018: DWORD
Depth of the index tree - 1 there is no index, 2 if there is one level of
PMGI chunks 001C: DWORD Chunk number of root index chunk, -1 if there is none
(though at least one file has 0 despite there being no index chunk, probably
a bug) 0020: DWORD Chunk number of first PMGL (listing) chunk 0024: DWORD
Chunk number of last PMGL (listing) chunk 0028: DWORD -1 (unknown) 002C:
DWORD Number of directory chunks (total) 0030: DWORD Windows language ID
0034: GUID {5D02926A-212E-11D0-9DF9-00A0C922E6EC} 0044: DWORD $54 (This is
the length again) 0048: DWORD -1 (unknown) 004C: DWORD -1 (unknown) 0050:
DWORD -1 (unknown)
|
| ChmLzxBlock |
Decompresses a chm block.
|
| ChmLzxcControlData |
::DataSpace/Storage//ControlData This file contains $20 bytes of
information on the compression.
|
| ChmLzxcResetTable |
LZXC reset table For ensuring a decompression.
|
| ChmLzxState |
|
| ChmParser |
|
| ChmParsingException |
|
| ChmPmgiHeader |
Description Note: not always exists An index chunk has the following format:
0000: char[4] 'PMGI' 0004: DWORD Length of quickref/free area at end of
directory chunk 0008: Directory index entries (to quickref/free area) The
quickref area in an PMGI is the same as in an PMGL The format of a directory
index entry is as follows: BYTE: length of name BYTEs: name (UTF-8 encoded)
ENCINT: directory listing chunk which starts with name Encoded Integers aka
ENCINT An ENCINT is a variable-length integer.
|
| ChmPmglHeader |
Description There are two types of directory chunks -- index chunks, and
listing chunks.
|
| ChmSection |
|
| ChmWrapper |
|
| ChunkingFactory |
This class is used to create instance of AbstractChunking.
|
| ChunkingMethod |
|
| Compact64bitInt |
A 9-byte encoding of values in the range 0x0002000000000000 through 0xFFFFFFFFFFFFFFFF
|
| CompactID |
This class is used to represent the CompactID structrue.
|
| DataElement |
|
| DataElementData |
Base class of data element
|
| DataElementHash |
Specifies an data element hash stream object
|
| DataElementPackage |
|
| DataElementParseErrorException |
|
| DataElementType |
The enumeration of the data element type
|
| DataElementUtils |
|
| DataHashObject |
|
| DataNodeObjectData |
Data Node Object data
|
| DataSizeObject |
Data Size Object
|
| DirectoryListingEntry |
The format of a directory listing entry is as follows: BYTE: length of name
BYTEs: name (UTF-8 encoded) ENCINT: content section ENCINT: offset ENCINT:
length The offset is from the beginning of the content section the file is
in, after the section has been decompressed (if appropriate).
|
| EightBytesOfData |
This class is used to represent the property contains 8 bytes of data in the PropertySet.rgData stream field.
|
| EmailVisitor |
|
| EmbeddedPartMetadata |
This class records metadata about embedded parts that exists in the xml
of the main document.
|
| EMFParser |
Extracts files embedded in EMF and offers a
very rough capability to extract text if there
is text stored in the EMF.
|
| Error |
|
| ExcelExtractor |
Excel parser implementation which uses POI's Event API
to handle the contents of a Workbook.
|
| ExGuid |
|
| ExGUIDArray |
|
| ExtendedGUID |
|
| FormattingUtils |
|
| FormattingUtils.Tag |
|
| FourBytesOfData |
This class is used to represent the property contains 4 bytes of data in the PropertySet.rgData stream field.
|
| GlobalIdTableEntry3FNDX |
|
| GlobalIdTableEntryFNDX |
|
| GUID |
|
| GuidUtil |
|
| HeaderCell |
|
| HSLFExtractor |
|
| IFSSHTTPBSerializable |
FSSHTTPB Serialize interface.
|
| IntermediateNodeObject |
|
| IntermediateNodeObject.RootNodeObjectBuilder |
The class is used to build a root node object.
|
| IProperty |
The interface of the property in OneNote file.
|
| JackcessParser |
Parser that handles Microsoft Access files via
Jackcess
|
| JCID |
This class is used to represent a JCID
|
| JCIDObject |
This class is used to represent the JCID object.
|
| LeafNodeObject |
|
| LeafNodeObject.IntermediateNodeObjectBuilder |
The class is used to build a intermediate node object.
|
| LibPstParser |
This is an optional PST parser that relies on the user installing
the GPL-3 libpst/readpst commandline tool and configuring
Tika to call this library via tika-config.xml
|
| LibPstParserConfig |
|
| LinkedCell |
Linked cell.
|
| ListDescriptor |
Contains the information for a single list in the list or list override tables.
|
| ListManager |
Computes the number text which goes at the beginning of each list paragraph
|
| LittleEndianBitConverter |
Implement a converter which converts to/from little-endian byte arrays
|
| MetadataExtractor |
OOXML metadata extractor.
|
| MSEmbeddedStreamTranslator |
|
| MSOneStorePackage |
|
| MSOneStoreParser |
|
| MSOwnerFileParser |
Parser for temporary MSOFfice files.
|
| NoData |
This class is used to represent the property contains no data.
|
| NodeObject |
|
| NumberCell |
Number cell.
|
| ObjectGroupData |
The ObjectGroupData class.
|
| ObjectGroupDataElementData |
|
| ObjectGroupDataElementData.Builder |
The internal class for build a list of DataElement from a node object.
|
| ObjectGroupDeclarations |
Object Group Declarations
|
| ObjectGroupMetadata |
Specifies an object group metadata
|
| ObjectGroupMetadataDeclarations |
Object Metadata Declaration
|
| ObjectGroupObjectBLOBDataDeclaration |
object data BLOB declaration
|
| ObjectGroupObjectData |
|
| ObjectGroupObjectDataBLOBReference |
object data BLOB reference
|
| ObjectGroupObjectDeclare |
|
| ObjectSpaceObjectPropSet |
This class is used to represent a ObjectSpaceObjectPropSet.
|
| ObjectSpaceObjectPropSet |
|
| ObjectSpaceObjectStreamHeader |
|
| ObjectSpaceObjectStreamOfContextIDs |
This class is used to represent a ObjectSpaceObjectStreamOfContextIDs.
|
| ObjectSpaceObjectStreamOfOIDs |
This class is used to represent a ObjectSpaceObjectStreamOfOIDs.
|
| ObjectSpaceObjectStreamOfOSIDs |
This class is used to represent a ObjectSpaceObjectStreamOfOSIDs.
|
| OfficeParser |
Defines a Microsoft document content extractor.
|
| OfficeParser.POIFSDocumentType |
|
| OfficeParserConfig |
|
| OldExcelParser |
A POI-powered Tika Parser for very old versions of Excel, from
pre-OLE2 days, such as Excel 4.
|
| OneByteOfData |
This class is used to represent the property contains 1 byte of data in the PropertySet.rgData stream field.
|
| OneNoteParser |
OneNote tika parser capable of parsing Microsoft OneNote files.
|
| OneNotePropertyEnum |
|
| OneNoteTreeWalkerOptions |
Options when walking the one note tree.
|
| OOXMLExtractor |
Interface implemented by all Tika OOXML extractors.
|
| OOXMLExtractorFactory |
Figures out the correct OOXMLExtractor for the supplied document and
returns it.
|
| OOXMLParser |
Office Open XML (OOXML) parser.
|
| OOXMLTikaBodyPartHandler |
|
| OOXMLWordAndPowerPointTextHandler |
This class is intended to handle anything that might contain IBodyElements:
main document, headers, footers, notes, slides, etc.
|
| OOXMLWordAndPowerPointTextHandler.EditType |
|
| OOXMLWordAndPowerPointTextHandler.XWPFBodyContentsHandler |
|
| OPCPackageDetector |
|
| OPCPackageWrapper |
This is a wrapper around OPCPackage that calls revert() instead of close().
|
| OutlookExtractor |
Outlook Message Parser.
|
| OutlookExtractor.RECIPIENT_TYPE |
|
| OutlookPSTParser |
Parser for MS Outlook PST email storage files
|
| ParagraphProperties |
|
| POIFSContainerDetector |
A detector that works on a POIFS OLE2 document
to figure out exactly what the file is.
|
| POIXMLTextExtractorDecorator |
|
| PropertyID |
This class is used to represent a PropertyID.
|
| PropertySet |
This class is used to represent a PropertySet.
|
| PropertySetObject |
This class is used to represent the property set.
|
| PropertyType |
|
| PrtArrayOfPropertyValues |
The class is used to represent the prtArrayOfPropertyValues .
|
| PrtFourBytesOfLengthFollowedByData |
This class is used to represent the prtFourBytesOfLengthFollowedByData.
|
| PSTMailItemParser |
|
| RDCAnalysisChunking |
This class is used to process RDC analysis chunking
|
| RequestTypes |
The enumeration of request type.
|
| RevisionManifest |
|
| RevisionManifestDataElementData |
|
| RevisionManifestObjectGroupReferences |
Specifies a revision manifest object group references, each followed by object group extended GUIDs
|
| RevisionManifestRootDeclare |
Specifies a revision manifest root declare, each followed by root and object extended GUIDs
|
| RevisionStoreObject |
The class is used to represent the revision store object.
|
| RevisionStoreObjectGroup |
|
| RTFParser |
RTF parser
|
| RunProperties |
WARNING: This class is mutable.
|
| SequenceNumberGenerator |
|
| SerialNumber |
|
| SignatureObject |
Signature Object
|
| SimpleChunking |
|
| SpreadsheetMLParser |
Parses wordml 2003 format Excel files.
|
| StorageIndexCellMapping |
Specifies the storage index cell mappings (with cell identifier, cell mapping extended GUID,
and cell mapping serial number)
|
| StorageIndexDataElementData |
|
| StorageIndexManifestMapping |
|
| StorageIndexRevisionMapping |
Specifies the storage index revision mappings (with revision and revision mapping
extended GUIDs, and revision mapping serial number)
|
| StorageManifestDataElementData |
|
| StorageManifestRootDeclare |
Specifies one or more storage manifest root declare.
|
| StorageManifestSchemaGUID |
Specifies a storage manifest schema GUID
|
| StreamObject |
|
| StreamObjectHeaderEnd |
|
| StreamObjectHeaderEnd16bit |
An 16-bit header for a compound object would indicate the end of a stream object
|
| StreamObjectHeaderEnd8bit |
An 8-bit header for a compound object would indicate the end of a stream object
|
| StreamObjectHeaderStart |
This class specifies the base class for 16-bit or 32-bit stream object header start
|
| StreamObjectHeaderStart16bit |
An 16-bit header for a compound object would indicate the start of a stream object
|
| StreamObjectHeaderStart32bit |
An 32-bit header for a compound object would indicate the start of a stream object
|
| StreamObjectParseErrorException |
|
| StreamObjectTypeHeaderEnd |
|
| StreamObjectTypeHeaderStart |
The enumeration of the stream object type header start
|
| SummaryExtractor |
Extractor for Common OLE2 (HPSF) metadata
|
| SXSLFPowerPointExtractorDecorator |
SAX/Streaming pptx extractior
|
| SXWPFWordExtractorDecorator |
This is an experimental, alternative extractor for docx files.
|
| TextCell |
Text cell.
|
| TikaExcelDataFormatter |
Overrides Excel's General format to include more
significant digits than the MS Spec allows.
|
| TikaExcelGeneralFormat |
A Format that allows up to 15 significant digits for integers.
|
| TNEFParser |
A POI-powered Tika Parser for TNEF (Transport Neutral
Encoding Format) messages, aka winmail.dat
|
| TwoBytesOfData |
This class is used to represent the property contains 2 bytes of data in the PropertySet.rgData stream field.
|
| UByte |
The unsigned byte type
|
| UInteger |
The unsigned int type
|
| ULong |
The unsigned long type
|
| UMath |
|
| Unsigned |
A utility class for static access to unsigned number functionality.
|
| UNumber |
A base type for unsigned numbers.
|
| UShort |
The unsigned short type
|
| UuidUtils |
|
| WMFParser |
This parser offers a very rough capability to extract text if there
is text stored in the WMF files.
|
| Word2006MLParser |
|
| WordExtractor |
|
| WordExtractor.TagAndStyle |
|
| WordMLParser |
Parses wordml 2003 format word files.
|
| XPSExtractorDecorator |
|
| XPSTextExtractor |
Currently, mostly a pass-through class to hold pkg and properties
and keep the general framework similar to our other POI-integrated
extractors.
|
| XSLFEventBasedPowerPointExtractor |
|
| XSLFPowerPointExtractorDecorator |
|
| XSSFBExcelExtractorDecorator |
|
| XSSFExcelExtractorDecorator |
|
| XSSFExcelExtractorDecorator.HeaderFooterFromString |
|
| XSSFExcelExtractorDecorator.SheetTextAsHTML |
Turns formatted sheet events into HTML
|
| XSSFExcelExtractorDecorator.XSSFSheetInterestingPartsCapturer |
Captures information on interesting tags, whilst
delegating the main work to the formatting handler
|
| XWPFEventBasedWordExtractor |
Experimental class that is based on POI's XSSFEventBasedExcelExtractor
|
| XWPFListManager |
|
| XWPFNumberingShim |
Stub class of POI's XWPFNumbering because onDocumentRead() is protected
|
| XWPFStylesShim |
For Tika, all we need (so far) is a mapping between styleId and a style's name.
|
| XWPFWordExtractorDecorator |
|
| ZipFilesChunking |
This class is used to process zip file chunking
|
| ZipHeader |
|