Package ai.djl.modality.nlp.preprocess
Interface Tokenizer
- All Superinterfaces:
TextProcessor
- All Known Implementing Classes:
BertFullTokenizer,BertTokenizer,SimpleTokenizer,WordpieceTokenizer
Tokenizer interface provides the ability to break-down sentences into embeddable tokens.-
Method Summary
Modifier and TypeMethodDescriptionbuildSentence(List<String> tokens) Combines a list of tokens to form a sentence.preprocess(List<String> tokens) Applies the preprocessing defined to the given input tokens.Breaks down the given sentence into a list of tokens that can be represented by embeddings.
-
Method Details
-
preprocess
Applies the preprocessing defined to the given input tokens.- Specified by:
preprocessin interfaceTextProcessor- Parameters:
tokens- the tokens created after the input text is tokenized- Returns:
- the preprocessed tokens
-
tokenize
Breaks down the given sentence into a list of tokens that can be represented by embeddings.- Parameters:
sentence- the sentence to tokenize- Returns:
- a
Listof tokens
-
buildSentence
Combines a list of tokens to form a sentence.- Parameters:
tokens- theListof tokens- Returns:
- the sentence built from the given tokens
-