Package com.basistech.rosette.dm
Class AnnotatedText
java.lang.Object
com.basistech.rosette.dm.AnnotatedText
- All Implemented Interfaces:
Serializable
The root of the data model. An
Generally, offsets used in the data model are character (UTF-16 elements) offsets into the original text. Offset ranges are always half-open. For example:
A note on serialization: due to the internal structure of this class and the classes that make up the model, we do not recommend that applications serialize this to Json (or XML or other representations) by applying a reflection-based toolkit 'as-is'. For Json, and Java, the 'adm-json' module provides the supported serialization.
AnnotatedText
is blob of text and its attributes.
The attributes are available from getAttributes()
, as well as from
some convenience accessors, such as getTokens()
or getEntities()
.
Generally, offsets used in the data model are character (UTF-16 elements) offsets into the original text. Offset ranges are always half-open. For example:
012345678901 Hello worldThe token "Hello" has start offset 0 and end offset 5.
A note on serialization: due to the internal structure of this class and the classes that make up the model, we do not recommend that applications serialize this to Json (or XML or other representations) by applying a reflection-based toolkit 'as-is'. For Json, and Java, the 'adm-json' module provides the supported serialization.
- See Also:
-
Nested Class Summary
-
Method Summary
Modifier and TypeMethodDescriptionReturns all of the annotations on this text.Returns the list of base noun phrases.Returns the list of categorizer results.getData()
Returns the character data for this text.Returns the list of dependencies.Returns document-level metadata.Return the embeddings associated with this text.Returns the list of entities.Deprecated.Returns the list of language regions.Returns the list of layout regions.Returns the list of relationship mentions.Deprecated.this constructs a list of the old objects for compatibility, the supported item isEntity
.Returns the list of script regions.Returns the list of sentences.Returns the list of sentiment results.MapAttribute<com.basistech.util.LanguageCode,
ListAttribute<SimilarTerm>> Returns the map of similar terms.getSimilarTerms
(com.basistech.util.LanguageCode languageCode) Convenience accessor for a language's list of similar terms.Returns the list of tokens.Returns the translations for the text.Returns the translated tokens.Returns the language results for the entire text.toString()
toString is a convenience for accessing the textual data, if any, in this annotated text.
-
Method Details
-
getData
Returns the character data for this text.- Returns:
- the character data for this text
-
getDocumentMetadata
Returns document-level metadata. Metadata keys are simple strings; values are lists of strings.- Returns:
- map of metadata associated with the document
-
getAttributes
Returns all of the annotations on this text. For the defined attributes, the keys will be values fromAttributeKey.key()
. The values are polymorphic; the subclass ofBaseAttribute
depends on the attribute. Applications should usually prefer to use the convenience accessors (e.g.getTokens
) instead, to avoid the need for a cast. Note that this map will not returnEntityMention
orResolvedEntity
objects, which are deprecated; they are only available from the specific accessors.- Returns:
- all of the annotations on this text
-
getTokens
Returns the list of tokens.- Returns:
- the list of tokens
-
getTranslatedTokens
Returns the translated tokens. This API allows for multiple translations. For example, element 0 may contain theTranslatedTokens
for Simplified Chinese, and element 1 may contain theTranslatedTokens
for Japanese. Usually only element 0 will be populated.- Returns:
- the list of translated tokens
-
getTranslatedData
Returns the translations for the text. This API allows multiple translations. For example, element 0 may contain theTranslatedData
for Simplified Chinese, and element 1 may contain theTranslatedData
for Japanese. Usually only element 0 will be populated.- Returns:
- the translations for the text
-
getLanguageDetectionRegions
Returns the list of language regions.- Returns:
- the list of language regions
-
getWholeTextLanguageDetection
Returns the language results for the entire text.- Returns:
- the language results for the entire text
-
getEntityMentions
Deprecated.this constructs a list of the old objects for compatibility, the supported item isMention
.Returns the list of entity mentions.- Returns:
- the list of entity mentions
-
getEntities
Returns the list of entities. Entities are ordered by the document order of their head mentions.- Returns:
- the list of entities
-
getEvents
-
getSimilarTerms
Returns the map of similar terms.- Returns:
- the map of similar terms
-
getSimilarTerms
Convenience accessor for a language's list of similar terms.- Parameters:
languageCode
- the language code whose similar terms to retrieve- Returns:
- the list of similar terms
- See Also:
-
getRelationshipMentions
Returns the list of relationship mentions.- Returns:
- the list of relationship mentions
-
getResolvedEntities
Deprecated.this constructs a list of the old objects for compatibility, the supported item isEntity
.Returns the list of resolved entities.- Returns:
- the list of resolved entities
-
getScriptRegions
Returns the list of script regions.- Returns:
- the list of script regions
-
getSentences
Returns the list of sentences.- Returns:
- the list of sentences
-
getLayoutRegions
Returns the list of layout regions.- Returns:
- the list of layout regions
-
getBaseNounPhrases
Returns the list of base noun phrases.- Returns:
- the list of base noun phrases
-
getCategorizerResults
Returns the list of categorizer results.- Returns:
- the list of categorizer results
-
getSentimentResults
Returns the list of sentiment results.- Returns:
- the list of sentiment results
-
getDependencies
Returns the list of dependencies.- Returns:
- the list of dependencies.
-
getTopicResults
-
getEmbeddings
Return the embeddings associated with this text. Embeddings, sometimes known as text vectors, are arrays of floating point numbers calculated from the entire text or subsets such as tokens or entities.- Returns:
- the embeddings.
-
getConcepts
-
getKeyphrases
-
getTransliteration
-
toString
toString is a convenience for accessing the textual data, if any, in this annotated text.- Overrides:
toString
in classObject
- Returns:
- the data for this AnnotatedText as a String.
If the data is
null
, this returnsnull
rather than throwing aNullPointerException
.
-
Mention
.