See: Description
Interface | Description |
---|---|
TextProcessor |
TextProcessor allows applying pre-processing to input tokens for natural language
applications. |
Tokenizer |
Tokenizer interface provides the ability to break-down sentences into embeddable tokens. |
Class | Description |
---|---|
HyphenNormalizer |
Unicode normalization does not take care of "exotic" hyphens that we normally do not want in NLP
input.
|
LambdaProcessor |
TextProcessor will apply user defined lambda function on input tokens. |
LowerCaseConvertor |
LowerCaseConvertor converts every character of the input tokens to it's respective lower
case character. |
PunctuationSeparator |
PunctuationSeparator separates punctuation into a separate token. |
SimpleTokenizer |
SimpleTokenizer is an implementation of the Tokenizer interface that converts
sentences into token by splitting them by a given delimiter. |
TextCleaner |
Applies remove or replace of certain characters based on condition.
|
TextTerminator |
A
TextProcessor that adds a beginning of string and end of string token. |
TextTruncator |
TextProcessor that truncates text to a maximum size. |
UnicodeNormalizer |
Applies unicode normalization to input strings.
|