Package ai.djl.modality.nlp.bert
Class BertTokenizer
java.lang.Object
ai.djl.modality.nlp.preprocess.SimpleTokenizer
ai.djl.modality.nlp.bert.BertTokenizer
- All Implemented Interfaces:
TextProcessor
,Tokenizer
- Direct Known Subclasses:
BertFullTokenizer
BertTokenizer is a class to help you encode question and paragraph sentence.
-
Constructor Summary
-
Method Summary
Modifier and TypeMethodDescriptionEncodes questions and paragraph sentences.Encodes questions and paragraph sentences with max length.<E> List<E>
Pads the tokens to the required length.Breaks down the given sentence into a list of tokens that can be represented by embeddings.tokenToString
(List<String> tokens) Returns a string presentation of the tokens.Methods inherited from class ai.djl.modality.nlp.preprocess.SimpleTokenizer
buildSentence
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface ai.djl.modality.nlp.preprocess.Tokenizer
preprocess
-
Constructor Details
-
BertTokenizer
public BertTokenizer()
-
-
Method Details
-
tokenize
Breaks down the given sentence into a list of tokens that can be represented by embeddings.- Specified by:
tokenize
in interfaceTokenizer
- Overrides:
tokenize
in classSimpleTokenizer
- Parameters:
input
- the sentence to tokenize- Returns:
- a
List
of tokens
-
tokenToString
Returns a string presentation of the tokens.- Parameters:
tokens
- a list of tokens- Returns:
- a string presentation of the tokens
-
pad
Pads the tokens to the required length.- Type Parameters:
E
- the type of the List- Parameters:
tokens
- the input tokenspadItem
- the things to pad at the endnum
- the total length after padding- Returns:
- a list of padded tokens
-
encode
Encodes questions and paragraph sentences.- Parameters:
question
- the input questionparagraph
- the input paragraph- Returns:
- BertToken
-
encode
Encodes questions and paragraph sentences with max length.- Parameters:
question
- the input questionparagraph
- the input paragraphmaxLength
- the maxLength- Returns:
- BertToken
-