public class SimpleTokenizer extends java.lang.Object implements Tokenizer
SimpleTokenizer is an implementation of the Tokenizer interface that converts
sentences into token by splitting them by a given delimiter.| Constructor and Description |
|---|
SimpleTokenizer()
Creates an instance of
SimpleTokenizer with the default delimiter. |
SimpleTokenizer(java.lang.String delimiter)
Creates an instance of
SimpleTokenizer with the given delimiter. |
| Modifier and Type | Method and Description |
|---|---|
java.lang.String |
buildSentence(java.util.List<java.lang.String> tokens)
Combines a list of tokens to form a sentence.
|
java.util.List<java.lang.String> |
tokenize(java.lang.String sentence)
Breaks down the given sentence into a list of tokens that can be represented by embeddings.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitpreprocesspublic SimpleTokenizer(java.lang.String delimiter)
SimpleTokenizer with the given delimiter.delimiter - the delimiterpublic SimpleTokenizer()
SimpleTokenizer with the default delimiter.public java.util.List<java.lang.String> tokenize(java.lang.String sentence)
public java.lang.String buildSentence(java.util.List<java.lang.String> tokens)
buildSentence in interface Tokenizertokens - the List of tokens