Package com.yahoo.language.opennlp
Class OpenNlpTokenizer
- java.lang.Object
-
- com.yahoo.language.opennlp.OpenNlpTokenizer
-
-
Constructor Summary
Constructors Constructor Description OpenNlpTokenizer()
OpenNlpTokenizer(Normalizer normalizer, Transformer transformer)
OpenNlpTokenizer(Normalizer normalizer, Transformer transformer, SpecialTokenRegistry specialTokenRegistry)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description Iterable<Token>
tokenize(String input, Language language, StemMode stemMode, boolean removeAccents)
Returns the tokens produced from an input string under the rules of the given Language and additional options-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface com.yahoo.language.process.Tokenizer
getReplacementTerm
-
-
-
-
Constructor Detail
-
OpenNlpTokenizer
public OpenNlpTokenizer()
-
OpenNlpTokenizer
public OpenNlpTokenizer(Normalizer normalizer, Transformer transformer)
-
OpenNlpTokenizer
public OpenNlpTokenizer(Normalizer normalizer, Transformer transformer, SpecialTokenRegistry specialTokenRegistry)
-
-
Method Detail
-
tokenize
public Iterable<Token> tokenize(String input, Language language, StemMode stemMode, boolean removeAccents)
Description copied from interface:Tokenizer
Returns the tokens produced from an input string under the rules of the given Language and additional options- Specified by:
tokenize
in interfaceTokenizer
- Parameters:
input
- the string to tokenize. May be arbitrarily large.language
- the language of the input string.stemMode
- the stem mode applied on the returned tokensremoveAccents
- if true accents and similar are removed from the returned tokens- Returns:
- the tokens of the input String.
-
-