BertFullTokenizer (Deep Java Library 0.15.0 API specification)

java.lang.Object
- ai.djl.modality.nlp.preprocess.SimpleTokenizer
- - ai.djl.modality.nlp.bert.BertTokenizer
  - - ai.djl.modality.nlp.bert.BertFullTokenizer

All Implemented Interfaces:

TextProcessor, Tokenizer
```
public class BertFullTokenizer
extends BertTokenizer
```
BertFullTokenizer runs end to end tokenization of input text
It will run basic preprocessors to clean the input text and then run WordpieceTokenizer to split into word pieces.
Reference implementation: Google Research Bert Tokenizer

Constructor Summary

Constructors
Constructor and Description

BertFullTokenizer(Vocabulary vocabulary, boolean lowerCase)
Creates an instance of BertFullTokenizer.

Constructors
Constructor and Description
`BertFullTokenizer(Vocabulary vocabulary, boolean lowerCase)` Creates an instance of `BertFullTokenizer`.

Method Summary

All Methods Static Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`static java.util.List<TextProcessor>`	`getPreprocessors(boolean lowerCase)` Get a list of `TextProcessor`s to process input text for Bert models.
`Vocabulary`	`getVocabulary()` Returns the `Vocabulary` used for tokenization.
`java.util.List<java.lang.String>`	`tokenize(java.lang.String input)` Breaks down the given sentence into a list of tokens that can be represented by embeddings.
`java.lang.String`	`tokenToString(java.util.List<java.lang.String> tokens)` Returns a string presentation of the tokens.

Methods inherited from class ai.djl.modality.nlp.bert.BertTokenizer
encode, encode, pad

Methods inherited from class ai.djl.modality.nlp.preprocess.SimpleTokenizer
buildSentence

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface ai.djl.modality.nlp.preprocess.Tokenizer
preprocess

- Constructor Detail
  - BertFullTokenizer
```
public BertFullTokenizer(Vocabulary vocabulary,
                         boolean lowerCase)
```
    Creates an instance of BertFullTokenizer.
    
    Parameters:
    
    vocabulary - the BERT vocabulary
    
    lowerCase - whether to convert tokens to lowercase
- Method Detail
  - getVocabulary
```
public Vocabulary getVocabulary()
```
    Returns the Vocabulary used for tokenization.
    
    Returns:
    
    the Vocabulary used for tokenization
  - tokenize
```
public java.util.List<java.lang.String> tokenize(java.lang.String input)
```
    Breaks down the given sentence into a list of tokens that can be represented by embeddings.
    
    Specified by:
    
    tokenize in interface Tokenizer
    
    Overrides:
    
    tokenize in class BertTokenizer
    
    Parameters:
    
    input - the sentence to tokenize
    
    Returns:
    
    a List of tokens
  - tokenToString
```
public java.lang.String tokenToString(java.util.List<java.lang.String> tokens)
```
    Returns a string presentation of the tokens.
    
    Overrides:
    
    tokenToString in class BertTokenizer
    
    Parameters:
    
    tokens - a list of tokens
    
    Returns:
    
    a string presentation of the tokens
  - getPreprocessors
```
public static java.util.List<TextProcessor> getPreprocessors(boolean lowerCase)
```
    Get a list of TextProcessors to process input text for Bert models.
    
    Parameters:
    
    lowerCase - whether to convert input to lowercase
    
    Returns:
    
    List of TextProcessors

Class BertFullTokenizer

Constructor Summary

Method Summary

Methods inherited from class ai.djl.modality.nlp.bert.BertTokenizer

Methods inherited from class ai.djl.modality.nlp.preprocess.SimpleTokenizer

Methods inherited from class java.lang.Object

Methods inherited from interface ai.djl.modality.nlp.preprocess.Tokenizer

Constructor Detail

BertFullTokenizer

Method Detail

getVocabulary

tokenize

tokenToString

getPreprocessors