See: Description
Interface | Description |
---|---|
TextProcessor |
TextProcessor allows applying pre-processing to input tokens for natural language
applications. |
Tokenizer |
Tokenizer interface provides the ability to break-down sentences into embeddable tokens. |
Class | Description |
---|---|
HyphenNormalizer |
Unicode normalization does not take care of "exotic" hyphens that we normally do not want in NLP
input.
|
LowerCaseConvertor |
LowerCaseConvertor converts every character of the input tokens to it's respective lower
case character. |
PunctuationSeparator |
PunctuationSeparator separates punction into a separate token. |
SimpleTokenizer |
SimpleTokenizer is an implementation of the Tokenizer interface that converts
sentences into token by splitting them by a given delimiter. |
TextTerminator |
A
TextProcessor that adds a beginning of string and end of string token. |
TextTruncator |
TextProcessor that truncates text to a maximum size. |
UnicodeNormalizer |
Applies unicode normalization to input strings.
|