Package ai.djl.modality.nlp.preprocess
Class SimpleTokenizer
- java.lang.Object
-
- ai.djl.modality.nlp.preprocess.SimpleTokenizer
-
- All Implemented Interfaces:
TextProcessor
,Tokenizer
- Direct Known Subclasses:
BertTokenizer
,WordpieceTokenizer
public class SimpleTokenizer extends java.lang.Object implements Tokenizer
SimpleTokenizer
is an implementation of theTokenizer
interface that converts sentences into token by splitting them by a given delimiter.
-
-
Constructor Summary
Constructors Constructor Description SimpleTokenizer()
Creates an instance ofSimpleTokenizer
with the default delimiter.SimpleTokenizer(java.lang.String delimiter)
Creates an instance ofSimpleTokenizer
with the given delimiter.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
buildSentence(java.util.List<java.lang.String> tokens)
Combines a list of tokens to form a sentence.java.util.List<java.lang.String>
tokenize(java.lang.String sentence)
Breaks down the given sentence into a list of tokens that can be represented by embeddings.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface ai.djl.modality.nlp.preprocess.Tokenizer
preprocess
-
-
-
-
Method Detail
-
tokenize
public java.util.List<java.lang.String> tokenize(java.lang.String sentence)
Breaks down the given sentence into a list of tokens that can be represented by embeddings.
-
buildSentence
public java.lang.String buildSentence(java.util.List<java.lang.String> tokens)
Combines a list of tokens to form a sentence.- Specified by:
buildSentence
in interfaceTokenizer
- Parameters:
tokens
- theList
of tokens- Returns:
- the sentence built from the given tokens
-
-