public static final class DefaultVocabulary.Builder
extends java.lang.Object
DefaultVocabulary
.Modifier and Type | Method and Description |
---|---|
DefaultVocabulary.Builder |
add(java.util.List<java.lang.String> sentence)
Adds the given sentence to the
DefaultVocabulary . |
DefaultVocabulary.Builder |
addAll(java.util.List<java.util.List<java.lang.String>> sentences)
Adds the given list of sentences to the
DefaultVocabulary . |
DefaultVocabulary.Builder |
addFromCustomizedFile(java.net.URL url,
java.util.function.Function<java.net.URL,java.util.List<java.lang.String>> lambda)
Adds a customized vocabulary to the
DefaultVocabulary . |
DefaultVocabulary.Builder |
addFromTextFile(java.nio.file.Path path)
Adds a text vocabulary to the
DefaultVocabulary . |
DefaultVocabulary.Builder |
addFromTextFile(java.net.URL url)
Adds a text vocabulary to the
DefaultVocabulary . |
DefaultVocabulary |
build()
Builds the
DefaultVocabulary object with the set arguments. |
DefaultVocabulary.Builder |
optMaxTokens(int maxTokens)
Sets the optional limit on the size of the vocabulary.
|
DefaultVocabulary.Builder |
optMinFrequency(int minFrequency)
Sets the optional parameter that specifies the minimum frequency to consider a token to
be part of the
DefaultVocabulary . |
DefaultVocabulary.Builder |
optReservedTokens(java.util.Collection<java.lang.String> reservedTokens)
Sets the optional parameter that sets the list of reserved tokens.
|
DefaultVocabulary.Builder |
optUnknownToken()
Sets the optional parameter that specifies the unknown token's string value with
">unk<".
|
DefaultVocabulary.Builder |
optUnknownToken(java.lang.String unknownToken)
Sets the optional parameter that specifies the unknown token's string value.
|
public DefaultVocabulary.Builder optMinFrequency(int minFrequency)
DefaultVocabulary
. Defaults to no minimum.minFrequency
- the minimum frequency to consider a token to be part of the DefaultVocabulary
or -1 for no minimumVocabularyBuilder
public DefaultVocabulary.Builder optMaxTokens(int maxTokens)
The size includes the reservedTokens. If the number of added tokens exceeds the maxToken limit, it keeps the most frequent tokens.
maxTokens
- the maximum number of tokens or -1 for no maximumDefaultVocabulary.Builder
public DefaultVocabulary.Builder optUnknownToken()
VocabularyBuilder
public DefaultVocabulary.Builder optUnknownToken(java.lang.String unknownToken)
unknownToken
- the string value of the unknown tokenVocabularyBuilder
public DefaultVocabulary.Builder optReservedTokens(java.util.Collection<java.lang.String> reservedTokens)
reservedTokens
- the list of reserved tokensVocabularyBuilder
public DefaultVocabulary.Builder add(java.util.List<java.lang.String> sentence)
DefaultVocabulary
.sentence
- the sentence to be addedVocabularyBuilder
public DefaultVocabulary.Builder addAll(java.util.List<java.util.List<java.lang.String>> sentences)
DefaultVocabulary
.sentences
- the list of sentences to be addedVocabularyBuilder
public DefaultVocabulary.Builder addFromTextFile(java.nio.file.Path path) throws java.io.IOException
DefaultVocabulary
.
Example text file(vocab.txt): token1 token2 token3 will be mapped to index of 0 1 2
path
- the path to the text fileVocabularyBuilder
java.io.IOException
- if failed to read vocabulary filepublic DefaultVocabulary.Builder addFromTextFile(java.net.URL url) throws java.io.IOException
DefaultVocabulary
.url
- the text file urlVocabularyBuilder
java.io.IOException
- if failed to read vocabulary filepublic DefaultVocabulary.Builder addFromCustomizedFile(java.net.URL url, java.util.function.Function<java.net.URL,java.util.List<java.lang.String>> lambda)
DefaultVocabulary
.url
- the text file urllambda
- the function to parse the vocabulary fileVocabularyBuilder
public DefaultVocabulary build()
DefaultVocabulary
object with the set arguments.DefaultVocabulary
object built