Interface Token

All Known Implementing Classes:
SimpleToken

public interface Token
A single token produced by the tokenizer.
Author:
Mathias Mølster Lidal
  • Method Summary

    Modifier and Type
    Method
    Description
    getComponent(int i)
    Returns a component token of this
    int
    Returns the number of components, if this token is a compound word (e.g.
    int
    Returns the number of stem forms available for this token.
    long
    Returns the offset position of this token
    Returns the original form of this token
    Returns the script of this token
    getStem(int i)
    Returns the stem at position i
    Returns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word.
    Returns the type of this token - word, space or punctuation etc.
    boolean
    Whether this token should be indexed
    boolean
    Returns whether this is an instance of a declared special token (e.g.
  • Method Details

    • getType

      TokenType getType()
      Returns the type of this token - word, space or punctuation etc.
    • getOrig

      String getOrig()
      Returns the original form of this token
    • getNumStems

      int getNumStems()
      Returns the number of stem forms available for this token.
    • getStem

      String getStem(int i)
      Returns the stem at position i
    • getNumComponents

      int getNumComponents()
      Returns the number of components, if this token is a compound word (e.g. german "kommunikationsfehler". Otherwise, return 0
      Returns:
      number of components, or 0 if none
    • getComponent

      Token getComponent(int i)
      Returns a component token of this
    • getOffset

      long getOffset()
      Returns the offset position of this token
    • getScript

      TokenScript getScript()
      Returns the script of this token
    • getTokenString

      String getTokenString()
      Returns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word. If this is a special token with a configured replacement, this will return the replacement token.
    • isSpecialToken

      boolean isSpecialToken()
      Returns whether this is an instance of a declared special token (e.g. c++)
    • isIndexable

      boolean isIndexable()
      Whether this token should be indexed