Package ai.djl.modality.nlp.bert
Class BertTokenizer
- java.lang.Object
-
- ai.djl.modality.nlp.preprocess.SimpleTokenizer
-
- ai.djl.modality.nlp.bert.BertTokenizer
-
- All Implemented Interfaces:
TextProcessor
,Tokenizer
- Direct Known Subclasses:
BertFullTokenizer
public class BertTokenizer extends SimpleTokenizer
BertTokenizer is a class to help you encode question and paragraph sentence.
-
-
Constructor Summary
Constructors Constructor Description BertTokenizer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description BertToken
encode(java.lang.String question, java.lang.String paragraph)
Encodes questions and paragraph sentences.BertToken
encode(java.lang.String question, java.lang.String paragraph, int maxLength)
Encodes questions and paragraph sentences with max length.<E> java.util.List<E>
pad(java.util.List<E> tokens, E padItem, int num)
Pads the tokens to the required length.java.util.List<java.lang.String>
tokenize(java.lang.String input)
Breaks down the given sentence into a list of tokens that can be represented by embeddings.java.lang.String
tokenToString(java.util.List<java.lang.String> tokens)
Returns a string presentation of the tokens.-
Methods inherited from class ai.djl.modality.nlp.preprocess.SimpleTokenizer
buildSentence
-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface ai.djl.modality.nlp.preprocess.Tokenizer
preprocess
-
-
-
-
Method Detail
-
tokenize
public java.util.List<java.lang.String> tokenize(java.lang.String input)
Breaks down the given sentence into a list of tokens that can be represented by embeddings.- Specified by:
tokenize
in interfaceTokenizer
- Overrides:
tokenize
in classSimpleTokenizer
- Parameters:
input
- the sentence to tokenize- Returns:
- a
List
of tokens
-
tokenToString
public java.lang.String tokenToString(java.util.List<java.lang.String> tokens)
Returns a string presentation of the tokens.- Parameters:
tokens
- a list of tokens- Returns:
- a string presentation of the tokens
-
pad
public <E> java.util.List<E> pad(java.util.List<E> tokens, E padItem, int num)
Pads the tokens to the required length.- Type Parameters:
E
- the type of the List- Parameters:
tokens
- the input tokenspadItem
- the things to pad at the endnum
- the total length after padding- Returns:
- a list of padded tokens
-
encode
public BertToken encode(java.lang.String question, java.lang.String paragraph)
Encodes questions and paragraph sentences.- Parameters:
question
- the input questionparagraph
- the input paragraph- Returns:
- BertToken
-
encode
public BertToken encode(java.lang.String question, java.lang.String paragraph, int maxLength)
Encodes questions and paragraph sentences with max length.- Parameters:
question
- the input questionparagraph
- the input paragraphmaxLength
- the maxLength- Returns:
- BertToken
-
-