Package com.yahoo.language.opennlp
Class OpenNlpTokenizer
java.lang.Object
com.yahoo.language.opennlp.OpenNlpTokenizer
- All Implemented Interfaces:
Tokenizer
Tokenizer using OpenNlp
- Author:
- matskin
-
Constructor Summary
ConstructorDescriptionOpenNlpTokenizer
(Normalizer normalizer, Transformer transformer) OpenNlpTokenizer
(Normalizer normalizer, Transformer transformer, SpecialTokenRegistry specialTokenRegistry) -
Method Summary
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface com.yahoo.language.process.Tokenizer
getReplacementTerm
-
Constructor Details
-
OpenNlpTokenizer
public OpenNlpTokenizer() -
OpenNlpTokenizer
-
OpenNlpTokenizer
public OpenNlpTokenizer(Normalizer normalizer, Transformer transformer, SpecialTokenRegistry specialTokenRegistry)
-
-
Method Details
-
tokenize
public Iterable<Token> tokenize(String input, Language language, StemMode stemMode, boolean removeAccents) Description copied from interface:Tokenizer
Returns the tokens produced from an input string under the rules of the given Language and additional options- Specified by:
tokenize
in interfaceTokenizer
- Parameters:
input
- the string to tokenize. May be arbitrarily large.language
- the language of the input string.stemMode
- the stem mode applied on the returned tokensremoveAccents
- if true accents and similar are removed from the returned tokens- Returns:
- the tokens of the input String.
-