Package ai.djl.modality.nlp.preprocess
Interface Tokenizer
- All Superinterfaces:
TextProcessor
- All Known Implementing Classes:
BertFullTokenizer
,BertTokenizer
,SimpleTokenizer
,WordpieceTokenizer
Tokenizer
interface provides the ability to break-down sentences into embeddable tokens.-
Method Summary
Modifier and TypeMethodDescriptionbuildSentence
(List<String> tokens) Combines a list of tokens to form a sentence.preprocess
(List<String> tokens) Applies the preprocessing defined to the given input tokens.Breaks down the given sentence into a list of tokens that can be represented by embeddings.
-
Method Details
-
preprocess
Applies the preprocessing defined to the given input tokens.- Specified by:
preprocess
in interfaceTextProcessor
- Parameters:
tokens
- the tokens created after the input text is tokenized- Returns:
- the preprocessed tokens
-
tokenize
Breaks down the given sentence into a list of tokens that can be represented by embeddings.- Parameters:
sentence
- the sentence to tokenize- Returns:
- a
List
of tokens
-
buildSentence
Combines a list of tokens to form a sentence.- Parameters:
tokens
- theList
of tokens- Returns:
- the sentence built from the given tokens
-