public interface TextTokenizer
A stop-word based string tokenizer.
TextIndexingService
,
EnglishTextTokenizer
,
NitriteBuilder.textTokenizer(TextTokenizer)
Modifier and Type | Method and Description |
---|---|
java.util.Set<java.lang.String> |
stopWords()
Gets all stop-words for a language.
|
java.util.Set<java.lang.String> |
tokenize(java.lang.String text)
Tokenize a
text and discards all stop-words from it. |
java.util.Set<java.lang.String> tokenize(java.lang.String text) throws java.io.IOException
Tokenize a text
and discards all stop-words from it.
text
- the text to tokenizejava.io.IOException
- if a low-level I/O error occurs.java.util.Set<java.lang.String> stopWords()
Gets all stop-words for a language.