Package com.yahoo.language.process
Interface Token
-
public interface Token
A single token produced by the tokenizer.- Author:
- Mathias Mølster Lidal
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description Token
getComponent(int i)
Returns a component token of thisint
getNumComponents()
Returns the number of components, if this token is a compound word (e.g.int
getNumStems()
Returns the number of stem forms available for this token.long
getOffset()
Returns the offset position of this tokenjava.lang.String
getOrig()
Returns the original form of this tokenTokenScript
getScript()
Returns the script of this tokenjava.lang.String
getStem(int i)
Returns the stem at position ijava.lang.String
getTokenString()
Returns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word.TokenType
getType()
Returns the type of this token - word, space or punctuation etc.boolean
isIndexable()
Whether this token should be indexedboolean
isSpecialToken()
Returns whether this is an instance of a declared special token (e.g.
-
-
-
Method Detail
-
getType
TokenType getType()
Returns the type of this token - word, space or punctuation etc.
-
getOrig
java.lang.String getOrig()
Returns the original form of this token
-
getNumStems
int getNumStems()
Returns the number of stem forms available for this token.
-
getStem
java.lang.String getStem(int i)
Returns the stem at position i
-
getNumComponents
int getNumComponents()
Returns the number of components, if this token is a compound word (e.g. german "kommunikationsfehler". Otherwise, return 0- Returns:
- number of components, or 0 if none
-
getComponent
Token getComponent(int i)
Returns a component token of this
-
getOffset
long getOffset()
Returns the offset position of this token
-
getScript
TokenScript getScript()
Returns the script of this token
-
getTokenString
java.lang.String getTokenString()
Returns the token string in a form suitable for indexing: The most lowercased variant of the most processed token form available, If called on a compound token this returns a lowercased form of the entire word. If this is a special token with a configured replacement, this will return the replacement token.
-
isSpecialToken
boolean isSpecialToken()
Returns whether this is an instance of a declared special token (e.g. c++)
-
isIndexable
boolean isIndexable()
Whether this token should be indexed
-
-