public class SimpleTokenizer extends java.lang.Object implements Tokenizer
SimpleTokenizer
is an implementation of the Tokenizer
interface that converts
sentences into token by splitting them by a given delimiter.Constructor and Description |
---|
SimpleTokenizer()
Creates an instance of
SimpleTokenizer with the default delimiter. |
SimpleTokenizer(java.lang.String delimiter)
Creates an instance of
SimpleTokenizer with the given delimiter. |
Modifier and Type | Method and Description |
---|---|
java.lang.String |
buildSentence(java.util.List<java.lang.String> tokens)
Combines a list of tokens to form a sentence.
|
java.util.List<java.lang.String> |
tokenize(java.lang.String sentence)
Breaks down the given sentence into a list of tokens that can be represented by embeddings.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
preprocess
public SimpleTokenizer(java.lang.String delimiter)
SimpleTokenizer
with the given delimiter.delimiter
- the delimiterpublic SimpleTokenizer()
SimpleTokenizer
with the default delimiter.public java.util.List<java.lang.String> tokenize(java.lang.String sentence)
public java.lang.String buildSentence(java.util.List<java.lang.String> tokens)
buildSentence
in interface Tokenizer
tokens
- the List
of tokens