ai.djl.modality.nlp.preprocess (Deep Java Library 0.33.0 API specification)

package ai.djl.modality.nlp.preprocess

Contains utility classes for natural language pre-processing tasks.

Related Packages

Package

Description

ai.djl.modality.nlp

Contains utility classes for natural language processing tasks.

ai.djl.modality.nlp.bert

Contains classes that deal with BERT for natural language pre-processing tasks.

ai.djl.modality.nlp.embedding

Contains classes that deal with word embeddings for natural language pre-processing tasks.

ai.djl.modality.nlp.generate

Contains utility classes for image manipulation.

ai.djl.modality.nlp.qa

Contains utility classes for question and answer processing.

ai.djl.modality.nlp.translator

Contains utility classes for each of the predefined translator.
Class

Description

HyphenNormalizer

Unicode normalization does not take care of "exotic" hyphens that we normally do not want in NLP input.

LambdaProcessor

TextProcessor will apply user defined lambda function on input tokens.

LowerCaseConvertor

LowerCaseConvertor converts every character of the input tokens to it's respective lower case character.

PunctuationSeparator

PunctuationSeparator separates punctuation into a separate token.

SimpleTokenizer

SimpleTokenizer is an implementation of the Tokenizer interface that converts sentences into token by splitting them by a given delimiter.

TextCleaner

Applies remove or replace of certain characters based on condition.

TextProcessor

TextProcessor allows applying pre-processing to input tokens for natural language applications.

TextTerminator

A TextProcessor that adds a beginning of string and end of string token.

TextTruncator

TextProcessor that truncates text to a maximum size.

Tokenizer

Tokenizer interface provides the ability to break-down sentences into embeddable tokens.

UnicodeNormalizer

Applies unicode normalization to input strings.

Package ai.djl.modality.nlp.preprocess