public class BertWordPieceTokenizer extends Object implements Tokenizer
| Modifier and Type | Field and Description |
|---|---|
static Pattern |
splitPattern |
| Constructor and Description |
|---|
BertWordPieceTokenizer(String tokens,
NavigableMap<String,Integer> vocab,
TokenPreProcess preTokenizePreProcessor,
TokenPreProcess tokenPreProcess) |
| Modifier and Type | Method and Description |
|---|---|
protected void |
checkIfEmpty(Map<String,Integer> m,
String candidate) |
int |
countTokens()
The number of tokens in the tokenizer
|
protected String |
findLongestSubstring(NavigableMap<String,Integer> vocab,
String candidate) |
List<String> |
getTokens()
Returns a list of all the tokens
|
boolean |
hasMoreTokens()
An iterator for tracking whether
more tokens are left in the iterator not
|
String |
nextToken()
The next token (word usually) in the string
|
void |
setTokenPreProcessor(TokenPreProcess tokenPreProcessor)
Set the token pre process
|
public static final Pattern splitPattern
public BertWordPieceTokenizer(String tokens, NavigableMap<String,Integer> vocab, TokenPreProcess preTokenizePreProcessor, TokenPreProcess tokenPreProcess)
public boolean hasMoreTokens()
TokenizerhasMoreTokens in interface Tokenizerpublic int countTokens()
TokenizercountTokens in interface Tokenizerpublic String nextToken()
Tokenizerpublic List<String> getTokens()
Tokenizerpublic void setTokenPreProcessor(TokenPreProcess tokenPreProcessor)
TokenizersetTokenPreProcessor in interface TokenizertokenPreProcessor - the token pre processor to setprotected String findLongestSubstring(NavigableMap<String,Integer> vocab, String candidate)
Copyright © 2021. All rights reserved.