Package ai.djl.modality.nlp.preprocess
Class SimpleTokenizer
java.lang.Object
ai.djl.modality.nlp.preprocess.SimpleTokenizer
- All Implemented Interfaces:
 TextProcessor,Tokenizer
- Direct Known Subclasses:
 BertTokenizer,WordpieceTokenizer
SimpleTokenizer is an implementation of the Tokenizer interface that converts
 sentences into token by splitting them by a given delimiter.- 
Constructor Summary
ConstructorsConstructorDescriptionCreates an instance ofSimpleTokenizerwith the default delimiter (" ").SimpleTokenizer(String delimiter) Creates an instance ofSimpleTokenizerwith the given delimiter. - 
Method Summary
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface ai.djl.modality.nlp.preprocess.Tokenizer
preprocess 
- 
Constructor Details
- 
SimpleTokenizer
Creates an instance ofSimpleTokenizerwith the given delimiter.- Parameters:
 delimiter- the delimiter
 - 
SimpleTokenizer
public SimpleTokenizer()Creates an instance ofSimpleTokenizerwith the default delimiter (" "). 
 - 
 - 
Method Details
- 
tokenize
Breaks down the given sentence into a list of tokens that can be represented by embeddings. - 
buildSentence
Combines a list of tokens to form a sentence.- Specified by:
 buildSentencein interfaceTokenizer- Parameters:
 tokens- theListof tokens- Returns:
 - the sentence built from the given tokens
 
 
 -