Package com.yahoo.language.process
Interface Token
- All Known Implementing Classes:
SimpleToken
public interface Token
A single token produced by the tokenizer.
- Author:
- Mathias Mølster Lidal
-
Method Summary
Modifier and TypeMethodDescriptiongetComponent
(int i) Returns a component token of thisint
Returns the number of components, if this token is a compound word (e.g.int
Returns the number of stem forms available for this token.long
Returns the offset position of this tokengetOrig()
Returns the original form of this tokenReturns the script of this tokengetStem
(int i) Returns the stem at position iReturns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word.getType()
Returns the type of this token - word, space or punctuation etc.boolean
Whether this token should be indexedboolean
Returns whether this is an instance of a declared special token (e.g.
-
Method Details
-
getType
TokenType getType()Returns the type of this token - word, space or punctuation etc. -
getOrig
String getOrig()Returns the original form of this token -
getNumStems
int getNumStems()Returns the number of stem forms available for this token. -
getStem
Returns the stem at position i -
getNumComponents
int getNumComponents()Returns the number of components, if this token is a compound word (e.g. german "kommunikationsfehler". Otherwise, return 0- Returns:
- number of components, or 0 if none
-
getComponent
Returns a component token of this -
getOffset
long getOffset()Returns the offset position of this token -
getScript
TokenScript getScript()Returns the script of this token -
getTokenString
String getTokenString()Returns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word. If this is a special token with a configured replacement, this will return the replacement token. -
isSpecialToken
boolean isSpecialToken()Returns whether this is an instance of a declared special token (e.g. c++) -
isIndexable
boolean isIndexable()Whether this token should be indexed
-