Package ai.djl.modality.nlp.bert
Class BertTokenizer
java.lang.Object
ai.djl.modality.nlp.preprocess.SimpleTokenizer
ai.djl.modality.nlp.bert.BertTokenizer
- All Implemented Interfaces:
TextProcessor,Tokenizer
- Direct Known Subclasses:
BertFullTokenizer
BertTokenizer is a class to help you encode question and paragraph sentence.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionEncodes questions and paragraph sentences.Encodes questions and paragraph sentences with max length.<E> List<E>Pads the tokens to the required length.Breaks down the given sentence into a list of tokens that can be represented by embeddings.tokenToString(List<String> tokens) Returns a string presentation of the tokens.Methods inherited from class ai.djl.modality.nlp.preprocess.SimpleTokenizer
buildSentenceMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface ai.djl.modality.nlp.preprocess.Tokenizer
preprocess
-
Constructor Details
-
BertTokenizer
public BertTokenizer()
-
-
Method Details
-
tokenize
Breaks down the given sentence into a list of tokens that can be represented by embeddings.- Specified by:
tokenizein interfaceTokenizer- Overrides:
tokenizein classSimpleTokenizer- Parameters:
input- the sentence to tokenize- Returns:
- a
Listof tokens
-
tokenToString
Returns a string presentation of the tokens.- Parameters:
tokens- a list of tokens- Returns:
- a string presentation of the tokens
-
pad
Pads the tokens to the required length.- Type Parameters:
E- the type of the List- Parameters:
tokens- the input tokenspadItem- the things to pad at the endnum- the total length after padding- Returns:
- a list of padded tokens
-
encode
Encodes questions and paragraph sentences.- Parameters:
question- the input questionparagraph- the input paragraph- Returns:
- BertToken
-
encode
Encodes questions and paragraph sentences with max length.- Parameters:
question- the input questionparagraph- the input paragraphmaxLength- the maxLength- Returns:
- BertToken
-