Package ai.djl.modality.nlp.preprocess
Class SimpleTokenizer
java.lang.Object
ai.djl.modality.nlp.preprocess.SimpleTokenizer
- All Implemented Interfaces:
- TextProcessor,- Tokenizer
- Direct Known Subclasses:
- BertTokenizer,- WordpieceTokenizer
SimpleTokenizer is an implementation of the Tokenizer interface that converts
 sentences into token by splitting them by a given delimiter.- 
Constructor SummaryConstructorsConstructorDescriptionCreates an instance ofSimpleTokenizerwith the default delimiter (" ").SimpleTokenizer(String delimiter) Creates an instance ofSimpleTokenizerwith the given delimiter.
- 
Method SummaryMethods inherited from class java.lang.Objectclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface ai.djl.modality.nlp.preprocess.Tokenizerpreprocess
- 
Constructor Details- 
SimpleTokenizerCreates an instance ofSimpleTokenizerwith the given delimiter.- Parameters:
- delimiter- the delimiter
 
- 
SimpleTokenizerpublic SimpleTokenizer()Creates an instance ofSimpleTokenizerwith the default delimiter (" ").
 
- 
- 
Method Details- 
tokenizeBreaks down the given sentence into a list of tokens that can be represented by embeddings.
- 
buildSentenceCombines a list of tokens to form a sentence.- Specified by:
- buildSentencein interface- Tokenizer
- Parameters:
- tokens- the- Listof tokens
- Returns:
- the sentence built from the given tokens
 
 
-