Package com.yahoo.language.process
Interface Tokenizer
- All Known Implementing Classes:
OpenNlpTokenizer
,SimpleTokenizer
public interface Tokenizer
Language-sensitive tokenization of a text string.
- Author:
- Mathias Mølster Lidal
-
Method Summary
Modifier and TypeMethodDescriptiondefault String
getReplacementTerm
(String tokenString) Deprecated.replacements are already applied in tokens returned by tokenizeReturns the tokens produced from an input string under the rules of the given Language and additional options
-
Method Details
-
tokenize
Returns the tokens produced from an input string under the rules of the given Language and additional options- Parameters:
input
- the string to tokenize. May be arbitrarily large.language
- the language of the input string.stemMode
- the stem mode applied on the returned tokensremoveAccents
- if true accents and similar are removed from the returned tokens- Returns:
- the tokens of the input String.
- Throws:
ProcessingException
- If the underlying library throws an Exception.
-
getReplacementTerm
Deprecated.replacements are already applied in tokens returned by tokenizeNot used.
-