Package ai.djl.modality.nlp.preprocess
package ai.djl.modality.nlp.preprocess
Contains utility classes for natural language pre-processing tasks.
-
ClassDescriptionUnicode normalization does not take care of "exotic" hyphens that we normally do not want in NLP input.
TextProcessor
will apply user defined lambda function on input tokens.LowerCaseConvertor
converts every character of the input tokens to it's respective lower case character.PunctuationSeparator
separates punctuation into a separate token.SimpleTokenizer
is an implementation of theTokenizer
interface that converts sentences into token by splitting them by a given delimiter.Applies remove or replace of certain characters based on condition.TextProcessor
allows applying pre-processing to input tokens for natural language applications.ATextProcessor
that adds a beginning of string and end of string token.TextProcessor
that truncates text to a maximum size.Tokenizer
interface provides the ability to break-down sentences into embeddable tokens.Applies unicode normalization to input strings.