Interface Tokenizer.Configuration

Enclosing interface:
Tokenizer

public static interface Tokenizer.Configuration
A nested interface representing the configuration options for this tokenizer. Implementors of this interface can set the maximum number of tokens, the maximum overlap between tokens, and the type of tokenization being performed.
  • Method Summary

    Modifier and Type
    Method
    Description
    void
    setMaxOverlap(int maxOverlap)
    Sets the maximum overlap between tokens, where an overlap is defined as the number of characters that are common between two adjacent segments.
    void
    setMaxTokens(int maxTokens)
    Sets the maximum number of tokens to be produced by the tokenizer.
    void
    Sets the type of tokenization being performed by this tokenizer.
  • Method Details

    • setMaxTokens

      void setMaxTokens(int maxTokens)
      Sets the maximum number of tokens to be produced by the tokenizer.
      Parameters:
      maxTokens - the new maximum number of tokens
    • setMaxOverlap

      void setMaxOverlap(int maxOverlap)
      Sets the maximum overlap between tokens, where an overlap is defined as the number of characters that are common between two adjacent segments.
      Parameters:
      maxOverlap - the new maximum overlap
    • setType

      void setType(String type)
      Sets the type of tokenization being performed by this tokenizer. This can typically be specific to the implementation.
      Parameters:
      type - the tokenization type