All Classes Interface Summary Class Summary Enum Summary Exception Summary Annotation Types Summary
Class |
Description |
AbstractEncodingDetectorParser |
|
AbstractParser |
Abstract base class for new parsers.
|
AbstractRecursiveParserWrapperHandler |
|
AccessPermissionException |
Exception to be thrown when a document does not allow content extraction.
|
AccessPermissions |
Until we can find a common standard, we'll use these options.
|
AnnotationUtils |
This class contains utilities for dealing with tika annotations
|
AttributeMatcher |
Final evaluation state of a .../@* XPath expression.
|
AutoDetectParser |
|
AutoDetectParserFactory |
Factory for an AutoDetectParser
|
AutoDetectReader |
An input stream reader that automatically detects the character encoding
to be used for converting bytes to characters.
|
BasicContentHandlerFactory |
Basic factory for creating common types of ContentHandlers
|
BasicContentHandlerFactory.HANDLER_TYPE |
Common handler types for content.
|
BodyContentHandler |
Content handler decorator that only passes everything inside
the XHTML <body/> tag to the underlying handler.
|
BoundedInputStream |
Very slight modification of Commons' BoundedInputStream
so that we can figure out if this hit the bound or not.
|
CharsetUtils |
|
ChildMatcher |
Intermediate evaluation state of a .../*... XPath expression.
|
CleanPhoneText |
Class to help de-obfuscate phone numbers in text.
|
ClimateForcast |
|
ClosedInputStream |
Closed input stream.
|
CloseShieldInputStream |
Proxy stream that prevents the underlying input stream from being closed.
|
CompositeDetector |
Content type detector that combines multiple different detection mechanisms.
|
CompositeDigester |
|
CompositeEncodingDetector |
|
CompositeExternalParser |
A Composite Parser that wraps up all the available External Parsers,
and provides an easy way to access them.
|
CompositeMatcher |
Composite XPath evaluation state.
|
CompositeParser |
Composite parser that delegates parsing tasks to a component parser
based on the declared content type of the incoming document.
|
ConcurrentUtils |
Utility Class for Concurrency in Tika
|
ConfigurableThreadPoolExecutor |
Allows Thread Pool to be Configurable.
|
ContainerExtractor |
Tika container extractor interface.
|
ContentHandlerDecorator |
|
ContentHandlerFactory |
Interface to allow easier injection of code for getting a new ContentHandler
|
CorruptedFileException |
This exception should be thrown when the parse absolutely, positively has to stop.
|
CountingInputStream |
A decorating input stream that counts the number of bytes that have passed
through the stream so far.
|
CreativeCommons |
A collection of Creative Commons properties names.
|
CryptoParser |
Decrypts the incoming document stream and delegates further parsing to
another parser instance.
|
Database |
|
DateUtils |
Date related utility methods and constants
|
DefaultDetector |
|
DefaultEncodingDetector |
|
DefaultParser |
|
DefaultProbDetector |
A version of DefaultDetector for probabilistic mime
detectors, which use statistical techniques to blend the
results of differing underlying detectors when attempting
to detect the type of a given file.
|
DefaultTranslator |
|
DelegatingParser |
Base class for parser implementations that want to delegate parts of the
task of parsing an input document to another parser.
|
Detector |
Content type detector.
|
DIFContentHandler |
|
DigestingParser |
|
DigestingParser.Digester |
Interface for digester.
|
DigestingParser.Encoder |
Encodes byte array from a MessageDigest to String
|
DocumentSelector |
Interface for different document selection strategies for purposes like
embedded document extraction by a ContainerExtractor instance.
|
DublinCore |
A collection of Dublin Core metadata names.
|
ElementMappingContentHandler |
Content handler decorator that maps element QName s using
a Map .
|
ElementMappingContentHandler.TargetElement |
|
ElementMatcher |
Final evaluation state of an XPath expression that targets an element.
|
EmbeddedContentHandler |
|
EmbeddedDocumentExtractor |
|
EmbeddedDocumentUtil |
Utility class to handle common issues with embedded documents.
|
EmbeddedResourceHandler |
Tika container extractor callback interface.
|
Embedder |
Tika embedder interface
|
EmptyDetector |
Dummy detector that returns application/octet-stream for all documents.
|
EmptyParser |
Dummy parser that always produces an empty XHTML document without even
attempting to parse the given document stream.
|
EmptyTranslator |
Dummy translator that always declines to give any text.
|
EncodingDetector |
Character encoding detector.
|
EncryptedDocumentException |
|
EndDocumentShieldingContentHandler |
|
EndianUtils |
General Endian Related Utilties.
|
EndianUtils.BufferUnderrunException |
|
ErrorParser |
Dummy parser that always throws a TikaException without even
attempting to parse the given document stream.
|
ExceptionUtils |
|
ExpandedTitleContentHandler |
|
ExternalEmbedder |
Embedder that uses an external program (like sed or exiftool) to embed text
content and metadata into a given document.
|
ExternalParser |
Parser that uses an external program (like catdoc or pdf2txt) to extract
text content and metadata from a given document.
|
ExternalParser.LineConsumer |
Consumer contract
|
ExternalParsersConfigReader |
Builds up ExternalParser instances based on XML file(s)
which define what to run, for what, and how to process
any output metadata.
|
ExternalParsersConfigReaderMetKeys |
|
ExternalParsersFactory |
Creates instances of ExternalParser based on XML
configuration files.
|
Field |
Field annotation is a contract for binding Param value from
Tika Configuration to an object.
|
FilenameUtils |
|
Font |
|
ForkParser |
|
ForkProxy |
|
ForkResource |
|
Geographic |
Geographic schema.
|
HexCoDec |
A set of Hex encoding and decoding utility methods.
|
HTML |
|
HttpHeaders |
A collection of HTTP header names.
|
Initializable |
Components that must do special processing across multiple fields
at initialization time should implement this interface.
|
InitializableProblemHandler |
This is to be used to handle potential recoverable problems that
might arise during initialization.
|
InputStreamDigester |
|
IOExceptionWithCause |
Subclasses IOException with the Throwable constructors missing before Java 6.
|
IOUtils |
General IO stream manipulation utilities.
|
IPTC |
IPTC photo metadata schema.
|
LanguageConfidence |
|
LanguageDetector |
|
LanguageHandler |
SAX content handler that updates a language detector based on all the
received character content.
|
LanguageIdentifier |
Deprecated.
|
LanguageNames |
Support for language tags (as defined by https://tools.ietf.org/html/bcp47)
See https://en.wikipedia.org/wiki/List_of_ISO_639-3_codes for a list of
three character language codes.
|
LanguageProfile |
Deprecated. |
LanguageProfilerBuilder |
Deprecated. |
LanguageResult |
|
LanguageWriter |
Writer that builds a language profile based on all the written content.
|
Link |
|
LinkContentHandler |
Content handler that collects links from an XHTML document.
|
LoadErrorHandler |
Interface for error handling strategies in service class loading.
|
LookaheadInputStream |
Stream wrapper that make it easy to read up to n bytes ahead from
a stream that supports the mark feature.
|
MagicDetector |
Content type detection based on magic bytes, i.e.
|
MappedBufferCleaner |
Copied/pasted from the Apache Lucene/Solr project.
|
Matcher |
XPath element matcher.
|
MatchingContentHandler |
Content handler decorator that only passes the elements, attributes,
and text nodes that match the given XPath expression.
|
MediaType |
Internet media type.
|
MediaTypeRegistry |
Registry of known Internet media types.
|
Message |
A collection of Message related property names.
|
Metadata |
A multi-valued metadata container.
|
MimeType |
Internet media type.
|
MimeTypeException |
A class to encapsulate MimeType related exceptions.
|
MimeTypes |
This class is a MimeType repository.
|
MimeTypesFactory |
Creates instances of MimeTypes.
|
MimeTypesReader |
A reader for XML files compliant with the freedesktop MIME-info DTD.
|
MimeTypesReaderMetKeys |
|
MSOffice |
A collection of Microsoft Office and Open Document property names.
|
NamedAttributeMatcher |
Final evaluation state of a .../@name XPath expression.
|
NamedElementMatcher |
Intermediate evaluation state of a .../name... XPath
expression.
|
NameDetector |
Content type detection based on the resource name.
|
NetworkParser |
|
NNExampleModelDetector |
|
NNTrainedModel |
|
NNTrainedModelBuilder |
|
NodeMatcher |
Final evaluation state of a .../node() XPath expression.
|
NonDetectingEncodingDetector |
Always returns the charset passed in via the initializer
|
NullInputStream |
A functional, light weight InputStream that emulates
a stream of a specified size.
|
NullOutputStream |
This OutputStream writes all data to the famous /dev/null.
|
Office |
Office Document properties collection.
|
OfficeOpenXMLCore |
Core properties as defined in the Office Open XML specification part Two that are not
in the DublinCore namespace.
|
OfficeOpenXMLExtended |
Extended properties as defined in the Office Open XML specification part Four.
|
OfflineContentHandler |
|
OverrideDetector |
|
PagedText |
XMP Paged-text schema.
|
Param<T> |
This is a serializable model class for parameters from configuration file.
|
ParamField |
This class stores metdata for Field annotation are used to map them
to Param at runtime
|
ParseContext |
Parse context.
|
Parser |
Tika parser interface.
|
ParserContainerExtractor |
|
ParserDecorator |
Decorator base class for the Parser interface.
|
ParserFactory |
|
ParserFactoryFactory |
Lightweight, easily serializable class that contains enough information
to build a ParserFactory
|
ParserPostProcessor |
Parser decorator that post-processes the results from a decorated parser.
|
ParserUtils |
Helper util methods for Parsers themselves.
|
ParsingEmbeddedDocumentExtractor |
Helper class for parsers of package archives or other compound document
formats that support embedded or attached component documents.
|
ParsingReader |
Reader for the text content from a given binary stream.
|
PasswordProvider |
Interface for providing a password to a Parser for handling Encrypted
and Password Protected Documents.
|
PDF |
PDF properties collection.
|
PhoneExtractingContentHandler |
Class used to extract phone numbers while parsing.
|
Photoshop |
XMP Photoshop metadata schema.
|
ProbabilisticMimeDetectionSelector |
Selector for combining different mime detection results
based on probability
|
ProbabilisticMimeDetectionSelector.Builder |
build class for probability parameters setting
|
ProcessUtils |
|
ProfilingHandler |
Deprecated.
|
ProfilingWriter |
Deprecated.
|
Property |
XMP property definition.
|
Property.PropertyType |
|
Property.ValueType |
|
PropertyTypeException |
XMP property definition violation exception.
|
ProxyInputStream |
A Proxy stream which acts as expected, that is it passes the method
calls on to the proxied stream and doesn't change which methods are
being called.
|
QuattroPro |
QuattroPro properties collection.
|
RecursiveParserWrapper |
This is a helper class that wraps a parser in a recursive handler.
|
RecursiveParserWrapperHandler |
|
RegexUtils |
Inspired from Nutch code class OutlinkExtractor.
|
RereadableInputStream |
Wraps an input stream, reading it only once, but making it available
for rereading an arbitrary number of times.
|
RichTextContentHandler |
Content handler for Rich Text, it will extract XHTML <img/>
tag <alt/> attribute and XHTML <a/> tag <name/>
attribute into the output.
|
RTFMetadata |
|
SafeContentHandler |
|
SafeContentHandler.Output |
Internal interface that allows both character and
ignorable whitespace content to be filtered the same way.
|
SecureContentHandler |
Content handler decorator that attempts to prevent denial of service
attacks against Tika parsers.
|
ServiceLoader |
Internal utility class that Tika uses to look up service providers.
|
ServiceLoaderUtils |
Service Loading and Ordering related utils
|
SimpleThreadPoolExecutor |
Simple Thread Pool Executor
|
StandardOrganizations |
This class provides a collection of the most important technical standard organizations.
|
StandardReference |
Class that represents a standard reference.
|
StandardReference.StandardReferenceBuilder |
|
StandardsExtractingContentHandler |
StandardsExtractingContentHandler is a Content Handler used to extract
standard references while parsing.
|
StandardsText |
StandardText relies on regular expressions to extract standard references
from text.
|
SubtreeMatcher |
Evaluation state of a ...//... XPath expression.
|
SystemUtils |
Copied from commons-lang to avoid requiring the dependency
|
TaggedContentHandler |
A content handler decorator that tags potential exceptions so that the
handler that caused the exception can easily be identified.
|
TaggedInputStream |
An input stream decorator that tags potential exceptions so that the
stream that caused the exception can easily be identified.
|
TaggedIOException |
An IOException wrapper that tags the wrapped exception with
a given object reference.
|
TaggedSAXException |
A SAXException wrapper that tags the wrapped exception with
a given object reference.
|
TailStream |
A specialized input stream implementation which records the last portion read
from an underlying stream.
|
TeeContentHandler |
Content handler proxy that forwards the received SAX events to zero or
more underlying content handlers.
|
TemporaryResources |
Utility class for tracking and ultimately closing or otherwise disposing
a collection of temporary resources.
|
TextContentHandler |
|
TextDetector |
Content type detection of plain text documents.
|
TextMatcher |
Final evaluation state of a .../text() XPath expression.
|
TextStatistics |
Utility class for computing a histogram of the bytes seen in a stream.
|
TIFF |
XMP Exif TIFF schema.
|
Tika |
Facade class for accessing Tika functionality.
|
TikaActivator |
Bundle activator that adjust the class loading mechanism of the
ServiceLoader class to work correctly in an OSGi environment.
|
TikaConfig |
Parse xml config file.
|
TikaConfigException |
Tika Config Exception is an exception to occur when there is an error
in Tika config file and/or one or more of the parsers failed to initialize
from that erroneous config.
|
TikaConfigSerializer |
|
TikaConfigSerializer.Mode |
|
TikaCoreProperties |
Contains a core set of basic Tika metadata properties, which all parsers
will attempt to supply (where the file format permits).
|
TikaCoreProperties.EmbeddedResourceType |
A file might contain different types of embedded documents.
|
TikaException |
Tika exception
|
TikaInputStream |
Input stream with extended capabilities.
|
TikaMemoryLimitException |
|
TikaMetadataKeys |
Contains keys to properties in Metadata instances.
|
TikaMimeKeys |
A collection of Tika metadata keys used in Mime Type resolution
|
ToHTMLContentHandler |
SAX event handler that serializes the HTML document to a character stream.
|
ToTextContentHandler |
SAX event handler that writes all character content out to a character
stream.
|
ToXMLContentHandler |
SAX event handler that serializes the XML document to a character stream.
|
TrainedModel |
|
TrainedModelDetector |
|
Translator |
Interface for Translator services.
|
TypeDetector |
Content type detection based on a content type hint.
|
UnsupportedFormatException |
Parsers should throw this exception when they encounter
a file format that they do not support.
|
WordPerfect |
WordPerfect properties collection.
|
WriteOutContentHandler |
SAX event handler that writes content up to an optional write
limit out to a character stream or other decorated handler.
|
XHTMLContentHandler |
Content handler decorator that simplifies the task of producing XHTML
events for Tika content parsers.
|
XMLReaderUtils |
Utility functions for reading XML.
|
XmlRootExtractor |
Utility class that uses a SAXParser to determine
the namespace URI and local name of the root element of an XML file.
|
XMP |
|
XMPContentHandler |
Content handler decorator that simplifies the task of producing XMP output.
|
XMPDM |
XMP Dynamic Media schema.
|
XMPDM.ChannelTypePropertyConverter |
Deprecated.
|
XMPIdq |
|
XMPMM |
|
XMPRights |
XMP Rights management schema.
|
XPathParser |
Parser for a very simple XPath subset.
|
ZeroByteFileException |
Exception thrown by the AutoDetectParser when a file contains zero-bytes.
|
ZeroSizeFileDetector |
Detector to identify zero length files as application/x-zerovalue
|