Class HuggingFaceTokenizer

java.lang.Object
dev.langchain4j.model.embedding.onnx.HuggingFaceTokenizer
All Implemented Interfaces:
dev.langchain4j.model.Tokenizer

public class HuggingFaceTokenizer extends Object implements dev.langchain4j.model.Tokenizer
A HuggingFace tokenizer.
Uses DJL's HuggingFaceTokenizer under the hood.
Requires tokenizer.json to instantiate. An example.
  • Constructor Details

    • HuggingFaceTokenizer

      public HuggingFaceTokenizer()
      Creates an instance of a HuggingFaceTokenizer using a built-in tokenizer.json file.
    • HuggingFaceTokenizer

      public HuggingFaceTokenizer(Path pathToTokenizer)
      Creates an instance of a HuggingFaceTokenizer using a provided tokenizer.json file.
      Parameters:
      pathToTokenizer - The path to the tokenizer file (e.g., "/path/to/tokenizer.json")
    • HuggingFaceTokenizer

      public HuggingFaceTokenizer(Path pathToTokenizer, Map<String,String> options)
      Creates an instance of a HuggingFaceTokenizer using a provided tokenizer.json file and a map of DJL's tokenizer options.
      Parameters:
      pathToTokenizer - The path to the tokenizer file (e.g., "/path/to/tokenizer.json")
      options - The DJL's tokenizer options
    • HuggingFaceTokenizer

      public HuggingFaceTokenizer(String pathToTokenizer)
      Creates an instance of a HuggingFaceTokenizer using a provided tokenizer.json file.
      Parameters:
      pathToTokenizer - The path to the tokenizer file (e.g., "/path/to/tokenizer.json")
    • HuggingFaceTokenizer

      public HuggingFaceTokenizer(String pathToTokenizer, Map<String,String> options)
      Creates an instance of a HuggingFaceTokenizer using a provided tokenizer.json file and a map of DJL's tokenizer options.
      Parameters:
      pathToTokenizer - The path to the tokenizer file (e.g., "/path/to/tokenizer.json")
      options - The DJL's tokenizer options
  • Method Details

    • estimateTokenCountInText

      public int estimateTokenCountInText(String text)
      Specified by:
      estimateTokenCountInText in interface dev.langchain4j.model.Tokenizer
    • estimateTokenCountInMessage

      public int estimateTokenCountInMessage(dev.langchain4j.data.message.ChatMessage message)
      Specified by:
      estimateTokenCountInMessage in interface dev.langchain4j.model.Tokenizer
    • estimateTokenCountInMessages

      public int estimateTokenCountInMessages(Iterable<dev.langchain4j.data.message.ChatMessage> messages)
      Specified by:
      estimateTokenCountInMessages in interface dev.langchain4j.model.Tokenizer
    • estimateTokenCountInToolSpecifications

      public int estimateTokenCountInToolSpecifications(Iterable<dev.langchain4j.agent.tool.ToolSpecification> toolSpecifications)
      Specified by:
      estimateTokenCountInToolSpecifications in interface dev.langchain4j.model.Tokenizer
    • estimateTokenCountInToolExecutionRequests

      public int estimateTokenCountInToolExecutionRequests(Iterable<dev.langchain4j.agent.tool.ToolExecutionRequest> toolExecutionRequests)
      Specified by:
      estimateTokenCountInToolExecutionRequests in interface dev.langchain4j.model.Tokenizer