All Classes Interface Summary Class Summary Enum Summary Exception Summary Annotation Types Summary
| Class |
Description |
| AbstractConfig |
|
| AbstractEmbeddedDocumentBytesHandler |
|
| AbstractEmitter |
|
| AbstractEncodingDetectorParser |
|
| AbstractExternalProcessParser |
Abstract base class for parsers that call external processes.
|
| AbstractFetcher |
|
| AbstractMultipleParser |
Abstract base class for parser wrappers which may / will
process a given stream multiple times, merging the results
of the various parsers used.
|
| AbstractMultipleParser.MetadataPolicy |
The various strategies for handling metadata emitted by
multiple parsers.
|
| AbstractParser |
Deprecated.
|
| AbstractRecursiveParserWrapperHandler |
|
| AccessPermissionException |
Exception to be thrown when a document does not allow content extraction.
|
| AccessPermissions |
Until we can find a common standard, we'll use these options.
|
| AnnotationUtils |
This class contains utilities for dealing with tika annotations
|
| AsyncConfig |
|
| AsyncEmitter |
Worker thread that takes EmitData off the queue, batches it
and tries to emit it as a batch
|
| AsyncProcessor |
This is the main class for handling async requests.
|
| AsyncStatus |
|
| AsyncStatus.ASYNC_STATUS |
|
| AttributeMatcher |
Final evaluation state of a .
|
| AutoDetectParser |
|
| AutoDetectParserConfig |
This config object can be used to tune how conservative we want to be
when parsing data that is extremely compressible and resembles a ZIP
bomb.
|
| AutoDetectParserFactory |
Factory for an AutoDetectParser
|
| AutoDetectReader |
An input stream reader that automatically detects the character encoding
to be used for converting bytes to characters.
|
| BasicContentHandlerFactory |
Basic factory for creating common types of ContentHandlers
|
| BasicContentHandlerFactory.HANDLER_TYPE |
Common handler types for content.
|
| BasicEmbeddedBytesSelector |
|
| BasicEmbeddedDocumentBytesHandler |
For now, this is an in-memory EmbeddedDocumentBytesHandler that stores
all the bytes in memory.
|
| BodyContentHandler |
Content handler decorator that only passes everything inside
the XHTML <body/> tag to the underlying handler.
|
| BoundedInputStream |
Very slight modification of Commons' BoundedInputStream
so that we can figure out if this hit the bound or not.
|
| CallablePipesIterator |
This is a simple wrapper around PipesIterator
that allows it to be called in its own thread.
|
| CaptureGroupMetadataFilter |
This filter runs a regex against the first value in the "sourceField".
|
| CharsetUtils |
|
| ChildMatcher |
Intermediate evaluation state of a .../*... XPath expression.
|
| CleanPhoneText |
Class to help de-obfuscate phone numbers in text.
|
| ClearByAttachmentTypeMetadataFilter |
This class clears the entire metadata object if the
attachment type matches one of the types.
|
| ClearByMimeMetadataFilter |
This class clears the entire metadata object if the
mime matches the mime filter.
|
| ClimateForcast |
|
| CompareUtils |
|
| CompositeDetector |
Content type detector that combines multiple different detection mechanisms.
|
| CompositeDigester |
|
| CompositeEncodingDetector |
|
| CompositeExternalParser |
A Composite Parser that wraps up all the available External Parsers,
and provides an easy way to access them.
|
| CompositeMatcher |
Composite XPath evaluation state.
|
| CompositeMetadataFilter |
|
| CompositeMetadataListFilter |
|
| CompositeParser |
Composite parser that delegates parsing tasks to a component parser
based on the declared content type of the incoming document.
|
| CompositePipesReporter |
|
| CompositeRenderer |
|
| ConcurrentUtils |
Utility Class for Concurrency in Tika
|
| ConfigBase |
|
| ConfigurableThreadPoolExecutor |
Allows Thread Pool to be Configurable.
|
| ContainerExtractor |
Tika container extractor interface.
|
| ContentHandlerDecorator |
|
| ContentHandlerDecoratorFactory |
|
| ContentHandlerFactory |
Interface to allow easier injection of code for getting a new ContentHandler
|
| CorruptedFileException |
This exception should be thrown when the parse absolutely, positively has to stop.
|
| CreativeCommons |
A collection of Creative Commons properties names.
|
| CryptoParser |
Decrypts the incoming document stream and delegates further parsing to
another parser instance.
|
| Database |
|
| DateNormalizingMetadataFilter |
Some dates in some file formats do not have a timezone.
|
| DateUtils |
Date related utility methods and constants
|
| DefaultDetector |
|
| DefaultEmbeddedStreamTranslator |
Loads EmbeddedStreamTranslators via service loading.
|
| DefaultEncodingDetector |
|
| DefaultMetadataFilter |
|
| DefaultParser |
|
| DefaultProbDetector |
A version of DefaultDetector for probabilistic mime
detectors, which use statistical techniques to blend the
results of differing underlying detectors when attempting
to detect the type of a given file.
|
| DefaultTranslator |
|
| DelegatingParser |
Base class for parser implementations that want to delegate parts of the
task of parsing an input document to another parser.
|
| Detector |
Content type detector.
|
| DIFContentHandler |
|
| DigestingParser |
|
| DigestingParser.Digester |
Interface for digester.
|
| DigestingParser.DigesterFactory |
|
| DigestingParser.Encoder |
Encodes byte array from a MessageDigest to String
|
| DocumentSelector |
Interface for different document selection strategies for purposes like
embedded document extraction by a ContainerExtractor instance.
|
| DublinCore |
A collection of Dublin Core metadata names.
|
| ElementMappingContentHandler |
Content handler decorator that maps element QNames using
a Map.
|
| ElementMappingContentHandler.TargetElement |
|
| ElementMatcher |
Final evaluation state of an XPath expression that targets an element.
|
| EmbeddedBytesSelector |
|
| EmbeddedBytesSelector.AcceptAll |
|
| EmbeddedContentHandler |
|
| EmbeddedDocumentBytesConfig |
|
| EmbeddedDocumentBytesConfig.SUFFIX_STRATEGY |
|
| EmbeddedDocumentBytesHandler |
|
| EmbeddedDocumentByteStoreExtractorFactory |
|
| EmbeddedDocumentExtractor |
|
| EmbeddedDocumentExtractorFactory |
|
| EmbeddedDocumentUtil |
Utility class to handle common issues with embedded documents.
|
| EmbeddedResourceHandler |
Tika container extractor callback interface.
|
| EmbeddedStreamTranslator |
Interface for different filtering of embedded streams.
|
| Embedder |
Tika embedder interface
|
| EmitData |
|
| EmitKey |
|
| Emitter |
|
| EmitterManager |
Utility class that will apply the appropriate fetcher
to the fetcherString based on the prefix.
|
| EmittingEmbeddedDocumentBytesHandler |
|
| EmptyDetector |
Dummy detector that returns application/octet-stream for all documents.
|
| EmptyEmitter |
|
| EmptyFetcher |
|
| EmptyParser |
Dummy parser that always produces an empty XHTML document without even
attempting to parse the given document stream.
|
| EmptyTranslator |
Dummy translator that always declines to give any text.
|
| EncodingDetector |
Character encoding detector.
|
| EncryptedDocumentException |
|
| EndDocumentShieldingContentHandler |
|
| EndianUtils |
General Endian Related Utilties.
|
| EndianUtils.BufferUnderrunException |
|
| Epub |
EPub properties collection.
|
| ErrorParser |
Dummy parser that always throws a TikaException without even
attempting to parse the given document stream.
|
| ExceptionUtils |
|
| ExcludeFieldMetadataFilter |
|
| ExpandedTitleContentHandler |
|
| ExternalEmbedder |
Embedder that uses an external program (like sed or exiftool) to embed text
content and metadata into a given document.
|
| ExternalParser |
Parser that uses an external program (like catdoc or pdf2txt) to extract
text content and metadata from a given document.
|
| ExternalParser |
This is a next generation external parser that uses some of the more
recent additions to Tika.
|
| ExternalParser.LineConsumer |
Consumer contract
|
| ExternalParsersConfigReader |
Builds up ExternalParser instances based on XML file(s)
which define what to run, for what, and how to process
any output metadata.
|
| ExternalParsersConfigReaderMetKeys |
|
| ExternalParsersFactory |
Creates instances of ExternalParser based on XML
configuration files.
|
| ExternalProcess |
|
| FailedToStartClientException |
This should be catastrophic
|
| FallbackParser |
Tries multiple parsers in turn, until one succeeds.
|
| FetchEmitTuple |
|
| FetchEmitTuple.ON_PARSE_EXCEPTION |
|
| Fetcher |
Interface for an object that will fetch an InputStream given
a fetch string.
|
| FetcherConfigContainer |
|
| FetcherManager |
Utility class to hold multiple fetchers.
|
| FetcherStringException |
If something goes wrong in parsing the fetcher string
|
| FetchKey |
Pair of fetcherName (which fetcher to call) and the key
to send to that fetcher to retrieve a specific file.
|
| Field |
Field annotation is a contract for binding Param value from
Tika Configuration to an object.
|
| FieldNameMappingFilter |
|
| FileCommandDetector |
This runs the linux 'file' command against a file.
|
| FileListPipesIterator |
Reads a list of file names/relative paths from a UTF-8 file.
|
| FilenameUtils |
|
| FileProcessResult |
|
| FileSystem |
A collection of metadata elements for file system level metadata
|
| FileSystemFetcher |
|
| FileSystemFetcherConfig |
|
| FileSystemPipesIterator |
|
| FileTooLongException |
|
| Font |
|
| ForkParser |
|
| ForkProxy |
|
| ForkResource |
|
| Geographic |
Geographic schema.
|
| GeoPointMetadataFilter |
|
| HandlerConfig |
|
| HandlerConfig.PARSE_MODE |
|
| HexCoDec |
A set of Hex encoding and decoding utility methods.
|
| HTML |
|
| HttpHeaders |
A collection of HTTP header names.
|
| IncludeFieldMetadataFilter |
|
| Initializable |
Components that must do special processing across multiple fields
at initialization time should implement this interface.
|
| InitializableProblemHandler |
This is to be used to handle potential recoverable problems that
might arise during initialization.
|
| InputStreamDigester |
|
| InputStreamFactory |
A factory which returns a fresh InputStream for the same
resource each time.
|
| IOUtils |
|
| IPTC |
IPTC photo metadata schema.
|
| LanguageConfidence |
|
| LanguageDetector |
|
| LanguageHandler |
SAX content handler that updates a language detector based on all the
received character content.
|
| LanguageNames |
Support for language tags (as defined by https://tools.ietf.org/html/bcp47)
|
| LanguageResult |
|
| LanguageWriter |
Writer that builds a language profile based on all the written content.
|
| Link |
|
| LinkContentHandler |
Content handler that collects links from an XHTML document.
|
| LoadErrorHandler |
Interface for error handling strategies in service class loading.
|
| LoggingPipesReporter |
Simple PipesReporter that logs everything at the debug level.
|
| LookaheadInputStream |
Stream wrapper that make it easy to read up to n bytes ahead from
a stream that supports the mark feature.
|
| MachineMetadata |
Metadata for describing machines, such as their
architecture, type and endian-ness
|
| MachineMetadata.Endian |
|
| MagicDetector |
Content type detection based on magic bytes, i.e. type-specific patterns
near the beginning of the document input stream.
|
| MAPI |
Properties that typically appear in MSG/PST message format files.
|
| Matcher |
XPath element matcher.
|
| MatchingContentHandler |
Content handler decorator that only passes the elements, attributes,
and text nodes that match the given XPath expression.
|
| MediaType |
Internet media type.
|
| MediaTypeRegistry |
Registry of known Internet media types.
|
| Message |
A collection of Message related property names.
|
| Metadata |
A multi-valued metadata container.
|
| MetadataFilter |
Filters the metadata in place after the parse
|
| MetadataListFilter |
|
| MetadataWriteFilter |
|
| MetadataWriteFilterFactory |
|
| MimeType |
Internet media type.
|
| MimeTypeException |
A class to encapsulate MimeType related exceptions.
|
| MimeTypes |
This class is a MimeType repository.
|
| MimeTypesFactory |
Creates instances of MimeTypes.
|
| MimeTypesReader |
A reader for XML files compliant with the freedesktop MIME-info DTD.
|
| MimeTypesReaderMetKeys |
|
| NamedAttributeMatcher |
Final evaluation state of a ...
|
| NamedElementMatcher |
Intermediate evaluation state of a ...
|
| NameDetector |
Content type detection based on the resource name.
|
| NetworkParser |
|
| NNExampleModelDetector |
|
| NNTrainedModel |
|
| NNTrainedModelBuilder |
|
| NodeMatcher |
Final evaluation state of a ...
|
| NonDetectingEncodingDetector |
Always returns the charset passed in via the initializer
|
| NoOpFilter |
This filter performs no operations on the metadata
and leaves it untouched.
|
| NoOpListFilter |
|
| OfferLargerThanQueueSize |
|
| Office |
Office Document properties collection.
|
| OfficeOpenXMLCore |
Core properties as defined in the Office Open XML specification part Two that are not
in the DublinCore namespace.
|
| OfficeOpenXMLExtended |
Extended properties as defined in the Office Open XML specification part Four.
|
| OfflineContentHandler |
|
| OverrideDetector |
Deprecated.
|
| PageBasedRenderResults |
|
| PagedText |
XMP Paged-text schema.
|
| PageRangeRequest |
The range of pages to render.
|
| Param<T> |
This is a serializable model class for parameters from configuration file.
|
| ParamField |
This class stores metdata for Field annotation are used to map them
to Param at runtime
|
| ParentContentHandler |
Simple pointer class to allow parsers to pass on the parent contenthandler through
to the embedded document's parse
|
| ParseContext |
Parse context.
|
| Parser |
Tika parser interface.
|
| ParserContainerExtractor |
|
| ParserDecorator |
Decorator base class for the Parser interface.
|
| ParseRecord |
Use this class to store exceptions, warnings and other information
during the parse.
|
| ParserFactory |
|
| ParserFactoryFactory |
Lightweight, easily serializable class that contains enough information
to build a ParserFactory
|
| ParserPostProcessor |
Parser decorator that post-processes the results from a decorated parser.
|
| ParserUtils |
Helper util methods for Parsers themselves.
|
| ParsingEmbeddedDocumentExtractor |
Helper class for parsers of package archives or other compound document
formats that support embedded or attached component documents.
|
| ParsingEmbeddedDocumentExtractorFactory |
|
| ParsingReader |
Reader for the text content from a given binary stream.
|
| PasswordProvider |
Interface for providing a password to a Parser for handling Encrypted
and Password Protected Documents.
|
| PDF |
PDF properties collection.
|
| PhoneExtractingContentHandler |
Class used to extract phone numbers while parsing.
|
| Photoshop |
XMP Photoshop metadata schema.
|
| PipesClient |
The PipesClient is designed to be single-threaded.
|
| PipesConfig |
|
| PipesConfigBase |
|
| PipesException |
Fatal exception that means that something went seriously wrong.
|
| PipesIterator |
Abstract class that handles the testing for timeouts/thread safety
issues.
|
| PipesParser |
|
| PipesReporter |
This is called asynchronously by the AsyncProcessor.
|
| PipesReporterBase |
|
| PipesResult |
|
| PipesResult.STATUS |
|
| PipesServer |
This server is forked from the PipesClient.
|
| PipesServer.STATUS |
|
| ProbabilisticMimeDetectionSelector |
Selector for combining different mime detection results
based on probability
|
| ProbabilisticMimeDetectionSelector.Builder |
build class for probability parameters setting
|
| ProcessUtils |
|
| Property |
XMP property definition.
|
| Property.PropertyType |
|
| Property.ValueType |
|
| PropertyTypeException |
XMP property definition violation exception.
|
| PST |
|
| QuattroPro |
QuattroPro properties collection.
|
| RangeFetcher |
This class extracts a range of bytes from a given fetch key.
|
| RecursiveParserWrapper |
This is a helper class that wraps a parser in a recursive handler.
|
| RecursiveParserWrapperHandler |
|
| RegexCaptureParser |
|
| RegexUtils |
Inspired from Nutch code class OutlinkExtractor.
|
| Renderer |
Interface for a renderer.
|
| Rendering |
|
| RenderingParser |
|
| RenderingState |
This should be to track state for each file (embedded or otherwise).
|
| RenderingTracker |
Use this in the ParseContext to keep track of unique ids for rendered
images in embedded docs.
|
| RenderRequest |
Empty interface for requests to a renderer.
|
| RenderResult |
|
| RenderResult.STATUS |
|
| RenderResults |
|
| RereadableInputStream |
Wraps an input stream, reading it only once, but making it available
for rereading an arbitrary number of times.
|
| RichTextContentHandler |
Content handler for Rich Text, it will extract XHTML <img/>
tag <alt/> attribute and XHTML <a/> tag <name/>
attribute into the output.
|
| RTFMetadata |
|
| RUnpackExtractor |
Recursive Unpacker and text and metadata extractor.
|
| RUnpackExtractorFactory |
|
| RuntimeSAXException |
Use this to throw a SAXException in subclassed methods that don't throw SAXExceptions
|
| SafeContentHandler |
|
| SafeContentHandler.Output |
Internal interface that allows both character and
ignorable whitespace content to be filtered the same way.
|
| SecureContentHandler |
Content handler decorator that attempts to prevent denial of service
attacks against Tika parsers.
|
| ServiceLoader |
Internal utility class that Tika uses to look up service providers.
|
| ServiceLoaderUtils |
Service Loading and Ordering related utils
|
| SimpleThreadPoolExecutor |
Simple Thread Pool Executor
|
| StandardOrganizations |
This class provides a collection of the most important technical standard organizations.
|
| StandardReference |
Class that represents a standard reference.
|
| StandardReference.StandardReferenceBuilder |
|
| StandardsExtractingContentHandler |
StandardsExtractingContentHandler is a Content Handler used to extract
standard references while parsing.
|
| StandardsText |
StandardText relies on regular expressions to extract standard references
from text.
|
| StandardWriteFilter |
This is to be used to limit the amount of metadata that a
parser can add based on the StandardWriteFilter.maxTotalEstimatedSize,
StandardWriteFilter.maxFieldSize, StandardWriteFilter.maxValuesPerField, and
StandardWriteFilter.maxKeySize.
|
| StandardWriteFilterFactory |
|
| StatefulParser |
The RecursiveParserWrapper wraps the parser sent
into the parsecontext and then uses that parser
to store state (among many other things).
|
| StoppingEarlyException |
Sentinel exception to stop parsing xml once target is found
while SAX parsing.
|
| StreamEmitter |
|
| StreamGobbler |
|
| StringUtils |
|
| SubtreeMatcher |
Evaluation state of a ...//... XPath expression.
|
| SupplementingParser |
|
| SystemUtils |
Copied from commons-lang to avoid requiring the dependency
|
| TaggedContentHandler |
A content handler decorator that tags potential exceptions so that the
handler that caused the exception can easily be identified.
|
| TaggedSAXException |
A SAXException wrapper that tags the wrapped exception with
a given object reference.
|
| TailStream |
A specialized input stream implementation which records the last portion read
from an underlying stream.
|
| TeeContentHandler |
Content handler proxy that forwards the received SAX events to zero or
more underlying content handlers.
|
| TemporaryResources |
Utility class for tracking and ultimately closing or otherwise disposing
a collection of temporary resources.
|
| TextAndAttributeContentHandler |
|
| TextContentHandler |
|
| TextDetector |
Content type detection of plain text documents.
|
| TextMatcher |
Final evaluation state of a ...
|
| TextStatistics |
Utility class for computing a histogram of the bytes seen in a stream.
|
| TIFF |
XMP Exif TIFF schema.
|
| Tika |
Facade class for accessing Tika functionality.
|
| TikaActivator |
Bundle activator that adjust the class loading mechanism of the
ServiceLoader class to work correctly in an OSGi environment.
|
| TikaConfig |
Parse xml config file.
|
| TikaConfigException |
Tika Config Exception is an exception to occur when there is an error
in Tika config file and/or one or more of the parsers failed to initialize
from that erroneous config.
|
| TikaConfigSerializer |
|
| TikaConfigSerializer.Mode |
|
| TikaCoreProperties |
Contains a core set of basic Tika metadata properties, which all parsers
will attempt to supply (where the file format permits).
|
| TikaCoreProperties.EmbeddedResourceType |
A file might contain different types of embedded documents.
|
| TikaEmitterException |
|
| TikaException |
Tika exception
|
| TikaInputStream |
Input stream with extended capabilities.
|
| TikaMemoryLimitException |
|
| TikaMimeKeys |
A collection of Tika metadata keys used in Mime Type resolution
|
| TikaPagedText |
Metadata properties for paged text, metadata appropriate
for an individual page (useful for embedded document handlers
called on individual pages).
|
| TikaTaskTimeout |
|
| TikaTimeoutException |
|
| ToHTMLContentHandler |
SAX event handler that serializes the HTML document to a character stream.
|
| TotalCounter |
Interface for pipesiterators that allow counting of total
documents.
|
| TotalCountResult |
|
| TotalCountResult.STATUS |
|
| ToTextContentHandler |
SAX event handler that writes all character content out to a character
stream.
|
| ToXMLContentHandler |
SAX event handler that serializes the XML document to a character stream.
|
| TrainedModel |
|
| TrainedModelDetector |
|
| Translator |
Interface for Translator services.
|
| TypeDetector |
Content type detection based on a content type hint.
|
| UnsupportedFormatException |
Parsers should throw this exception when they encounter
a file format that they do not support.
|
| UrlFetcher |
Simple fetcher for URLs.
|
| WARC |
|
| WordPerfect |
WordPerfect properties collection.
|
| WriteLimiter |
|
| WriteLimitReachedException |
|
| WriteOutContentHandler |
SAX event handler that writes content up to an optional write
limit out to a character stream or other decorated handler.
|
| XHTMLContentHandler |
Content handler decorator that simplifies the task of producing XHTML
events for Tika content parsers.
|
| XMLReaderUtils |
Utility functions for reading XML.
|
| XmlRootExtractor |
Utility class that uses a SAXParser to determine
the namespace URI and local name of the root element of an XML file.
|
| XMP |
Metadata keys for the XMP Basic Schema
|
| XMPContentHandler |
Content handler decorator that simplifies the task of producing XMP output.
|
| XMPDC |
Metadata keys for the XMP DublinCore schema.
|
| XMPDM |
XMP Dynamic Media schema.
|
| XMPDM.ChannelTypePropertyConverter |
Deprecated.
|
| XMPIdq |
|
| XMPMM |
|
| XMPPDF |
Metadata keys for the XMP PDF Schema
|
| XMPRights |
XMP Rights management schema.
|
| XPathParser |
Parser for a very simple XPath subset.
|
| ZeroByteFileException |
Exception thrown by the AutoDetectParser when a file contains zero-bytes.
|
| ZeroByteFileException.IgnoreZeroByteFileException |
|
| ZeroSizeFileDetector |
Detector to identify zero length files as application/x-zerovalue
|