ai.djl.modality.nlp.preprocess (Deep Java Library 0.7.0 API specification)

Interface Summary
Interface	Description
TextProcessor	`TextProcessor` allows applying pre-processing to input tokens for natural language applications.
Tokenizer	`Tokenizer` interface provides the ability to break-down sentences into embeddable tokens.

Class Summary
Class	Description
HyphenNormalizer	Unicode normalization does not take care of "exotic" hyphens that we normally do not want in NLP input.
LowerCaseConvertor	`LowerCaseConvertor` converts every character of the input tokens to it's respective lower case character.
PunctuationSeparator	`PunctuationSeparator` separates punction into a separate token.
SimpleTokenizer	`SimpleTokenizer` is an implementation of the `Tokenizer` interface that converts sentences into token by splitting them by a given delimiter.
TextTerminator	A `TextProcessor` that adds a beginning of string and end of string token.
TextTruncator	`TextProcessor` that truncates text to a maximum size.
UnicodeNormalizer	Applies unicode normalization to input strings.

Package ai.djl.modality.nlp.preprocess Description

Contains utility classes for natural language pre-processing tasks.