Package ai.djl.modality.nlp
Class DefaultVocabulary.Builder
java.lang.Object
ai.djl.modality.nlp.DefaultVocabulary.Builder
- Enclosing class:
- DefaultVocabulary
Builder class that is used to build the
DefaultVocabulary
.-
Method Summary
Modifier and TypeMethodDescriptionAdds the given sentence to theDefaultVocabulary
.Adds the given list of sentences to theDefaultVocabulary
.Adds a customized vocabulary to theDefaultVocabulary
.addFromTextFile
(URL url) Adds a text vocabulary to theDefaultVocabulary
.addFromTextFile
(Path path) Adds a text vocabulary to theDefaultVocabulary
.build()
Builds theDefaultVocabulary
object with the set arguments.optMaxTokens
(int maxTokens) Sets the optional limit on the size of the vocabulary.optMinFrequency
(int minFrequency) Sets the optional parameter that specifies the minimum frequency to consider a token to be part of theDefaultVocabulary
.optReservedTokens
(Collection<String> reservedTokens) Sets the optional parameter that sets the list of reserved tokens.Sets the optional parameter that specifies the unknown token's string value with ">unk<".optUnknownToken
(String unknownToken) Sets the optional parameter that specifies the unknown token's string value.
-
Method Details
-
optMinFrequency
Sets the optional parameter that specifies the minimum frequency to consider a token to be part of theDefaultVocabulary
. Defaults to no minimum.- Parameters:
minFrequency
- the minimum frequency to consider a token to be part of theDefaultVocabulary
or -1 for no minimum- Returns:
- this
VocabularyBuilder
-
optMaxTokens
Sets the optional limit on the size of the vocabulary.The size includes the reservedTokens. If the number of added tokens exceeds the maxToken limit, it keeps the most frequent tokens.
- Parameters:
maxTokens
- the maximum number of tokens or -1 for no maximum- Returns:
- this
DefaultVocabulary.Builder
-
optUnknownToken
Sets the optional parameter that specifies the unknown token's string value with ">unk<".- Returns:
- this
VocabularyBuilder
-
optUnknownToken
Sets the optional parameter that specifies the unknown token's string value.- Parameters:
unknownToken
- the string value of the unknown token- Returns:
- this
VocabularyBuilder
-
optReservedTokens
Sets the optional parameter that sets the list of reserved tokens.- Parameters:
reservedTokens
- the list of reserved tokens- Returns:
- this
VocabularyBuilder
-
add
Adds the given sentence to theDefaultVocabulary
.- Parameters:
sentence
- the sentence to be added- Returns:
- this
VocabularyBuilder
-
addAll
Adds the given list of sentences to theDefaultVocabulary
.- Parameters:
sentences
- the list of sentences to be added- Returns:
- this
VocabularyBuilder
-
addFromTextFile
Adds a text vocabulary to theDefaultVocabulary
.Example text file(vocab.txt): token1 token2 token3 will be mapped to index of 0 1 2
- Parameters:
path
- the path to the text file- Returns:
- this
VocabularyBuilder
- Throws:
IOException
- if failed to read vocabulary file
-
addFromTextFile
Adds a text vocabulary to theDefaultVocabulary
.- Parameters:
url
- the text file url- Returns:
- this
VocabularyBuilder
- Throws:
IOException
- if failed to read vocabulary file
-
addFromCustomizedFile
Adds a customized vocabulary to theDefaultVocabulary
.- Parameters:
url
- the text file urllambda
- the function to parse the vocabulary file- Returns:
- this
VocabularyBuilder
-
build
Builds theDefaultVocabulary
object with the set arguments.- Returns:
- the
DefaultVocabulary
object built
-