Package ai.djl.modality.nlp.preprocess


package ai.djl.modality.nlp.preprocess
Contains utility classes for natural language pre-processing tasks.
  • Class
    Description
    Unicode normalization does not take care of "exotic" hyphens that we normally do not want in NLP input.
    TextProcessor will apply user defined lambda function on input tokens.
    LowerCaseConvertor converts every character of the input tokens to it's respective lower case character.
    PunctuationSeparator separates punctuation into a separate token.
    SimpleTokenizer is an implementation of the Tokenizer interface that converts sentences into token by splitting them by a given delimiter.
    Applies remove or replace of certain characters based on condition.
    TextProcessor allows applying pre-processing to input tokens for natural language applications.
    A TextProcessor that adds a beginning of string and end of string token.
    TextProcessor that truncates text to a maximum size.
    Tokenizer interface provides the ability to break-down sentences into embeddable tokens.
    Applies unicode normalization to input strings.