Package org.dizitart.no2.index.fulltext
Interface TextTokenizer
-
- All Known Implementing Classes:
BaseTextTokenizer
,EnglishTextTokenizer
,UniversalTextTokenizer
public interface TextTokenizer
An abstract class representing a stop-word based text tokenizer.- Since:
- 1.0
- Author:
- Anindya Chatterjee.
- See Also:
EnglishTextTokenizer
-
-
Method Summary
All Methods Instance Methods Abstract Methods Modifier and Type Method Description Languages
getLanguage()
Gets the language for the tokenizer.Set<String>
stopWords()
Gets all stop-words for a language.Set<String>
tokenize(String text)
Tokenize atext
and discards all stop-words from it.
-