Package ai.djl.modality.nlp.preprocess
Class SimpleTokenizer
java.lang.Object
ai.djl.modality.nlp.preprocess.SimpleTokenizer
- All Implemented Interfaces:
TextProcessor
,Tokenizer
- Direct Known Subclasses:
BertTokenizer
,WordpieceTokenizer
SimpleTokenizer
is an implementation of the Tokenizer
interface that converts
sentences into token by splitting them by a given delimiter.-
Constructor Summary
ConstructorsConstructorDescriptionCreates an instance ofSimpleTokenizer
with the default delimiter (" ").SimpleTokenizer
(String delimiter) Creates an instance ofSimpleTokenizer
with the given delimiter. -
Method Summary
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface ai.djl.modality.nlp.preprocess.Tokenizer
preprocess
-
Constructor Details
-
SimpleTokenizer
Creates an instance ofSimpleTokenizer
with the given delimiter.- Parameters:
delimiter
- the delimiter
-
SimpleTokenizer
public SimpleTokenizer()Creates an instance ofSimpleTokenizer
with the default delimiter (" ").
-
-
Method Details
-
tokenize
Breaks down the given sentence into a list of tokens that can be represented by embeddings. -
buildSentence
Combines a list of tokens to form a sentence.- Specified by:
buildSentence
in interfaceTokenizer
- Parameters:
tokens
- theList
of tokens- Returns:
- the sentence built from the given tokens
-