Package ai.djl.modality.nlp.preprocess
Interface TextProcessor
- All Known Subinterfaces:
Tokenizer
- All Known Implementing Classes:
BertFullTokenizer,BertTokenizer,HyphenNormalizer,LambdaProcessor,LowerCaseConvertor,PunctuationSeparator,SimpleTokenizer,TextCleaner,TextTerminator,TextTruncator,UnicodeNormalizer,WordpieceTokenizer
public interface TextProcessor
TextProcessor allows applying pre-processing to input tokens for natural language
applications. Multiple implementations of TextProcessor can be applied on the same input.
The order of application of different implementations of TextProcessor can make a
difference in the final output.-
Method Summary
Modifier and TypeMethodDescriptionpreprocess(List<String> tokens) Applies the preprocessing defined to the given input tokens.
-
Method Details
-
preprocess
Applies the preprocessing defined to the given input tokens.- Parameters:
tokens- the tokens created after the input text is tokenized- Returns:
- the preprocessed tokens
-