Package com.yahoo.language.process
Interface Tokenizer
- All Known Implementing Classes:
SimpleTokenizer
public interface Tokenizer
Language-sensitive tokenization of a text string.
- Author:
- Mathias Mølster Lidal
-
Method Summary
-
Method Details
-
tokenize
Returns the tokens produced from an input string under the rules of the given Language and additional options- Parameters:
input
- the string to tokenize. May be arbitrarily large.language
- the language of the input string.stemMode
- the stem mode applied on the returned tokensremoveAccents
- whether to normalize accents and similar- Returns:
- the tokens of the input String
- Throws:
ProcessingException
- If the underlying library throws an Exception.
-