ai.djl.modality.nlp.preprocess (Deep Java Library 0.25.0 API specification)

Contains utility classes for natural language pre-processing tasks.

Interface Summary
Interface	Description
TextProcessor	`TextProcessor` allows applying pre-processing to input tokens for natural language applications.
Tokenizer	`Tokenizer` interface provides the ability to break-down sentences into embeddable tokens.

Class Summary
Class	Description
HyphenNormalizer	Unicode normalization does not take care of "exotic" hyphens that we normally do not want in NLP input.
LambdaProcessor	`TextProcessor` will apply user defined lambda function on input tokens.
LowerCaseConvertor	`LowerCaseConvertor` converts every character of the input tokens to it's respective lower case character.
PunctuationSeparator	`PunctuationSeparator` separates punctuation into a separate token.
SimpleTokenizer	`SimpleTokenizer` is an implementation of the `Tokenizer` interface that converts sentences into token by splitting them by a given delimiter.
TextCleaner	Applies remove or replace of certain characters based on condition.
TextTerminator	A `TextProcessor` that adds a beginning of string and end of string token.
TextTruncator	`TextProcessor` that truncates text to a maximum size.
UnicodeNormalizer	Applies unicode normalization to input strings.

Package ai.djl.modality.nlp.preprocess