Class AnnotatedText

java.lang.Object
com.basistech.rosette.dm.AnnotatedText
All Implemented Interfaces:
Serializable

public class AnnotatedText extends Object implements Serializable
The root of the data model. An AnnotatedText is blob of text and its attributes. The attributes are available from getAttributes(), as well as from some convenience accessors, such as getTokens() or getEntities().
Generally, offsets used in the data model are character (UTF-16 elements) offsets into the original text. Offset ranges are always half-open. For example:
 012345678901
 Hello world
 
The token "Hello" has start offset 0 and end offset 5.
A note on serialization: due to the internal structure of this class and the classes that make up the model, we do not recommend that applications serialize this to Json (or XML or other representations) by applying a reflection-based toolkit 'as-is'. For Json, and Java, the 'adm-json' module provides the supported serialization.
See Also:
  • Method Details

    • getData

      public CharSequence getData()
      Returns the character data for this text.
      Returns:
      the character data for this text
    • getDocumentMetadata

      public Map<String,List<String>> getDocumentMetadata()
      Returns document-level metadata. Metadata keys are simple strings; values are lists of strings.
      Returns:
      map of metadata associated with the document
    • getAttributes

      public Map<String,BaseAttribute> getAttributes()
      Returns all of the annotations on this text. For the defined attributes, the keys will be values from AttributeKey.key(). The values are polymorphic; the subclass of BaseAttribute depends on the attribute. Applications should usually prefer to use the convenience accessors (e.g. getTokens) instead, to avoid the need for a cast. Note that this map will not return EntityMention or ResolvedEntity objects, which are deprecated; they are only available from the specific accessors.
      Returns:
      all of the annotations on this text
    • getTokens

      public ListAttribute<Token> getTokens()
      Returns the list of tokens.
      Returns:
      the list of tokens
    • getTranslatedTokens

      public ListAttribute<TranslatedTokens> getTranslatedTokens()
      Returns the translated tokens. This API allows for multiple translations. For example, element 0 may contain the TranslatedTokens for Simplified Chinese, and element 1 may contain the TranslatedTokens for Japanese. Usually only element 0 will be populated.
      Returns:
      the list of translated tokens
    • getTranslatedData

      public ListAttribute<TranslatedData> getTranslatedData()
      Returns the translations for the text. This API allows multiple translations. For example, element 0 may contain the TranslatedData for Simplified Chinese, and element 1 may contain the TranslatedData for Japanese. Usually only element 0 will be populated.
      Returns:
      the translations for the text
    • getLanguageDetectionRegions

      public ListAttribute<LanguageDetection> getLanguageDetectionRegions()
      Returns the list of language regions.
      Returns:
      the list of language regions
    • getWholeTextLanguageDetection

      public LanguageDetection getWholeTextLanguageDetection()
      Returns the language results for the entire text.
      Returns:
      the language results for the entire text
    • getEntityMentions

      @Deprecated public ListAttribute<EntityMention> getEntityMentions()
      Deprecated.
      this constructs a list of the old objects for compatibility, the supported item is Mention.
      Returns the list of entity mentions.
      Returns:
      the list of entity mentions
    • getEntities

      public ListAttribute<Entity> getEntities()
      Returns the list of entities. Entities are ordered by the document order of their head mentions.
      Returns:
      the list of entities
    • getEvents

      public ListAttribute<Event> getEvents()
    • getSimilarTerms

      public MapAttribute<com.basistech.util.LanguageCode,ListAttribute<SimilarTerm>> getSimilarTerms()
      Returns the map of similar terms.
      Returns:
      the map of similar terms
    • getSimilarTerms

      public ListAttribute<SimilarTerm> getSimilarTerms(com.basistech.util.LanguageCode languageCode)
      Convenience accessor for a language's list of similar terms.
      Parameters:
      languageCode - the language code whose similar terms to retrieve
      Returns:
      the list of similar terms
      See Also:
    • getRelationshipMentions

      public ListAttribute<RelationshipMention> getRelationshipMentions()
      Returns the list of relationship mentions.
      Returns:
      the list of relationship mentions
    • getResolvedEntities

      @Deprecated public ListAttribute<ResolvedEntity> getResolvedEntities()
      Deprecated.
      this constructs a list of the old objects for compatibility, the supported item is Entity.
      Returns the list of resolved entities.
      Returns:
      the list of resolved entities
    • getScriptRegions

      public ListAttribute<ScriptRegion> getScriptRegions()
      Returns the list of script regions.
      Returns:
      the list of script regions
    • getSentences

      public ListAttribute<Sentence> getSentences()
      Returns the list of sentences.
      Returns:
      the list of sentences
    • getLayoutRegions

      public ListAttribute<LayoutRegion> getLayoutRegions()
      Returns the list of layout regions.
      Returns:
      the list of layout regions
    • getBaseNounPhrases

      public ListAttribute<BaseNounPhrase> getBaseNounPhrases()
      Returns the list of base noun phrases.
      Returns:
      the list of base noun phrases
    • getCategorizerResults

      public ListAttribute<CategorizerResult> getCategorizerResults()
      Returns the list of categorizer results.
      Returns:
      the list of categorizer results
    • getSentimentResults

      public ListAttribute<CategorizerResult> getSentimentResults()
      Returns the list of sentiment results.
      Returns:
      the list of sentiment results
    • getDependencies

      public ListAttribute<Dependency> getDependencies()
      Returns the list of dependencies.
      Returns:
      the list of dependencies.
    • getTopicResults

      public ListAttribute<CategorizerResult> getTopicResults()
    • getEmbeddings

      public Embeddings getEmbeddings()
      Return the embeddings associated with this text. Embeddings, sometimes known as text vectors, are arrays of floating point numbers calculated from the entire text or subsets such as tokens or entities.
      Returns:
      the embeddings.
    • getConcepts

      public ListAttribute<Concept> getConcepts()
    • getKeyphrases

      public ListAttribute<Keyphrase> getKeyphrases()
    • getTransliteration

      public TransliterationResults getTransliteration()
    • toString

      public String toString()
      toString is a convenience for accessing the textual data, if any, in this annotated text.
      Overrides:
      toString in class Object
      Returns:
      the data for this AnnotatedText as a String. If the data is null, this returns null rather than throwing a NullPointerException.