Package ai.djl.modality.nlp.preprocess
Interface TextProcessor
- All Known Subinterfaces:
Tokenizer
- All Known Implementing Classes:
BertFullTokenizer
,BertTokenizer
,HyphenNormalizer
,LambdaProcessor
,LowerCaseConvertor
,PunctuationSeparator
,SimpleTokenizer
,TextCleaner
,TextTerminator
,TextTruncator
,UnicodeNormalizer
,WordpieceTokenizer
public interface TextProcessor
TextProcessor
allows applying pre-processing to input tokens for natural language
applications. Multiple implementations of TextProcessor
can be applied on the same input.
The order of application of different implementations of TextProcessor
can make a
difference in the final output.-
Method Summary
Modifier and TypeMethodDescriptionpreprocess
(List<String> tokens) Applies the preprocessing defined to the given input tokens.
-
Method Details
-
preprocess
Applies the preprocessing defined to the given input tokens.- Parameters:
tokens
- the tokens created after the input text is tokenized- Returns:
- the preprocessed tokens
-