Class DefaultVocabulary.Builder

java.lang.Object
ai.djl.modality.nlp.DefaultVocabulary.Builder
Enclosing class:
DefaultVocabulary

public static final class DefaultVocabulary.Builder extends Object
Builder class that is used to build the DefaultVocabulary.
  • Method Details

    • optMinFrequency

      public DefaultVocabulary.Builder optMinFrequency(int minFrequency)
      Sets the optional parameter that specifies the minimum frequency to consider a token to be part of the DefaultVocabulary. Defaults to no minimum.
      Parameters:
      minFrequency - the minimum frequency to consider a token to be part of the DefaultVocabulary or -1 for no minimum
      Returns:
      this VocabularyBuilder
    • optMaxTokens

      public DefaultVocabulary.Builder optMaxTokens(int maxTokens)
      Sets the optional limit on the size of the vocabulary.

      The size includes the reservedTokens. If the number of added tokens exceeds the maxToken limit, it keeps the most frequent tokens.

      Parameters:
      maxTokens - the maximum number of tokens or -1 for no maximum
      Returns:
      this DefaultVocabulary.Builder
    • optUnknownToken

      public DefaultVocabulary.Builder optUnknownToken()
      Sets the optional parameter that specifies the unknown token's string value with ">unk<".
      Returns:
      this VocabularyBuilder
    • optUnknownToken

      public DefaultVocabulary.Builder optUnknownToken(String unknownToken)
      Sets the optional parameter that specifies the unknown token's string value.
      Parameters:
      unknownToken - the string value of the unknown token
      Returns:
      this VocabularyBuilder
    • optReservedTokens

      public DefaultVocabulary.Builder optReservedTokens(Collection<String> reservedTokens)
      Sets the optional parameter that sets the list of reserved tokens.
      Parameters:
      reservedTokens - the list of reserved tokens
      Returns:
      this VocabularyBuilder
    • add

      public DefaultVocabulary.Builder add(List<String> sentence)
      Adds the given sentence to the DefaultVocabulary.
      Parameters:
      sentence - the sentence to be added
      Returns:
      this VocabularyBuilder
    • addAll

      public DefaultVocabulary.Builder addAll(List<List<String>> sentences)
      Adds the given list of sentences to the DefaultVocabulary.
      Parameters:
      sentences - the list of sentences to be added
      Returns:
      this VocabularyBuilder
    • addFromTextFile

      public DefaultVocabulary.Builder addFromTextFile(Path path) throws IOException
      Adds a text vocabulary to the DefaultVocabulary.
         Example text file(vocab.txt):
         token1
         token2
         token3
         will be mapped to index of 0 1 2
       
      Parameters:
      path - the path to the text file
      Returns:
      this VocabularyBuilder
      Throws:
      IOException - if failed to read vocabulary file
    • addFromTextFile

      public DefaultVocabulary.Builder addFromTextFile(URL url) throws IOException
      Adds a text vocabulary to the DefaultVocabulary.
      Parameters:
      url - the text file url
      Returns:
      this VocabularyBuilder
      Throws:
      IOException - if failed to read vocabulary file
    • addFromCustomizedFile

      public DefaultVocabulary.Builder addFromCustomizedFile(URL url, Function<URL,List<String>> lambda)
      Adds a customized vocabulary to the DefaultVocabulary.
      Parameters:
      url - the text file url
      lambda - the function to parse the vocabulary file
      Returns:
      this VocabularyBuilder
    • build

      public DefaultVocabulary build()
      Builds the DefaultVocabulary object with the set arguments.
      Returns:
      the DefaultVocabulary object built