Package org.apache.tika.metadata
Interface TikaCoreProperties
public interface TikaCoreProperties
Contains a core set of basic Tika metadata properties, which all parsers
will attempt to supply (where the file format permits). These are all
defined in terms of other standard namespaces.
Users of Tika who wish to have consistent metadata across file formats
can make use of these Properties, knowing that where present they will
have consistent semantic meaning between different file formats. (No
matter if one file format calls it Title, another Long-Title and another
Long-Name, if they all mean the same thing as defined by
DublinCore.TITLE
then they will all be present as such)
For now, most of these properties are composite ones including the deprecated
non-prefixed String properties from the Metadata class. In Tika 2.0, most
of these will revert back to simple assignments.- Since:
- Apache Tika 1.2
-
Nested Class Summary
Nested ClassesModifier and TypeInterfaceDescriptionstatic enum
A file might contain different types of embedded documents. -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final Property
static final Property
static final Property
This is currently used to identify Content-Type that may be included within a document, such as in html documents (e.g.static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
Embedded resource type propertystatic final String
static final Property
static final Property
static final Property
static final Property
DublinCore.SUBJECT
; should include both subject and keywords if a document format has both.static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
Some file formats can store information about their original file name/location or about their attachment's original file name/location.static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
static final Property
Use this to store exceptions caught while trying to read the stream of an embedded resource.static final String
Use this to store parse exception information in the Metadata object.static final Property
Use this to store exceptions caught during a parse that are non-fatal, e.g.static final String
Use this to prefix metadata properties that store information about the parsing process.static final Property
static final Property
Deprecated.use TikaCoreProperties#KEYWORDSstatic final Property
Deprecated.use TikaCoreProperties#DESCRIPTIONstatic final Property
Deprecated.use TikaCoreProperties#TITLEstatic final Property
Deprecated.use OfficeOpenXMLCore#SUBJECTstatic final Property
-
Field Details
-
TIKA_META_PREFIX
Use this to prefix metadata properties that store information about the parsing process. Users should be able to distinguish between metadata that was contained within the document and metadata about the parsing process. In Tika 2.0 (or earlier?), let's change X-ParsedBy to X-TIKA-Parsed-By.- See Also:
-
TIKA_META_EXCEPTION_PREFIX
Use this to store parse exception information in the Metadata object.- See Also:
-
TIKA_META_EXCEPTION_WARNING
Use this to store exceptions caught during a parse that are non-fatal, e.g. if a parser is in lenient mode and more content can be extracted if we ignore an exception thrown by a dependency. -
TIKA_META_EXCEPTION_EMBEDDED_STREAM
Use this to store exceptions caught while trying to read the stream of an embedded resource. Do not use this if there is a parse exception on the embedded resource. -
EMBEDDED_RESOURCE_TYPE_KEY
- See Also:
-
ORIGINAL_RESOURCE_NAME
Some file formats can store information about their original file name/location or about their attachment's original file name/location. -
CONTENT_TYPE_HINT
This is currently used to identify Content-Type that may be included within a document, such as in html documents (e.g. ) , or the value might come from outside the document. This information may be faulty and should be treated only as a hint. -
CONTENT_TYPE_OVERRIDE
-
FORMAT
- See Also:
-
IDENTIFIER
- See Also:
-
CONTRIBUTOR
- See Also:
-
COVERAGE
- See Also:
-
CREATOR
- See Also:
-
MODIFIER
- See Also:
-
CREATOR_TOOL
- See Also:
-
LANGUAGE
- See Also:
-
PUBLISHER
- See Also:
-
RELATION
- See Also:
-
RIGHTS
- See Also:
-
SOURCE
- See Also:
-
TYPE
- See Also:
-
TITLE
- See Also:
-
DESCRIPTION
- See Also:
-
KEYWORDS
DublinCore.SUBJECT
; should include both subject and keywords if a document format has both. See alsoOffice.KEYWORDS
andOfficeOpenXMLCore.SUBJECT
. -
CREATED
- See Also:
-
MODIFIED
- See Also:
-
PRINT_DATE
- See Also:
-
METADATA_DATE
- See Also:
-
LATITUDE
- See Also:
-
LONGITUDE
- See Also:
-
ALTITUDE
- See Also:
-
RATING
- See Also:
-
COMMENTS
- See Also:
-
TRANSITION_KEYWORDS_TO_DC_SUBJECT
Deprecated.use TikaCoreProperties#KEYWORDS- See Also:
-
TRANSITION_SUBJECT_TO_DC_DESCRIPTION
Deprecated.use TikaCoreProperties#DESCRIPTION- See Also:
-
TRANSITION_SUBJECT_TO_DC_TITLE
Deprecated.use TikaCoreProperties#TITLE- See Also:
-
TRANSITION_SUBJECT_TO_OO_SUBJECT
Deprecated.use OfficeOpenXMLCore#SUBJECT- See Also:
-
EMBEDDED_RESOURCE_TYPE
Embedded resource type property -
HAS_SIGNATURE
-