public class BertTokenizer extends SimpleTokenizer
| Constructor and Description |
|---|
BertTokenizer() |
| Modifier and Type | Method and Description |
|---|---|
BertToken |
encode(java.lang.String question,
java.lang.String paragraph)
Encodes questions and paragraph sentences.
|
BertToken |
encode(java.lang.String question,
java.lang.String paragraph,
int maxLength)
Encodes questions and paragraph sentences with max length.
|
<E> java.util.List<E> |
pad(java.util.List<E> tokens,
E padItem,
int num)
Pads the tokens to the required length.
|
java.util.List<java.lang.String> |
tokenize(java.lang.String input)
Breaks down the given sentence into a list of tokens that can be represented by embeddings.
|
buildSentenceclone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitpreprocesspublic java.util.List<java.lang.String> tokenize(java.lang.String input)
tokenize in interface Tokenizertokenize in class SimpleTokenizerinput - the sentence to tokenizeList of tokenspublic <E> java.util.List<E> pad(java.util.List<E> tokens,
E padItem,
int num)
E - the type of the Listtokens - the input tokenspadItem - the things to pad at the endnum - the total length after paddingpublic BertToken encode(java.lang.String question, java.lang.String paragraph)
question - the input questionparagraph - the input paragraphpublic BertToken encode(java.lang.String question, java.lang.String paragraph, int maxLength)
question - the input questionparagraph - the input paragraphmaxLength - the maxLength