public interface ChatModelConfig

Field Summary

Fields

Modifier and Type

Field

Description

static final String

DEFAULT_INFERENCE_ENDPOINT
Method Summary

Modifier and Type

Method

Description

Optional<Boolean>

doSample()

Whether or not to use sampling ; use greedy decoding otherwise.

URL

inferenceEndpointUrl()

The URL of the inference endpoint for the chat model.

Optional<Boolean>

logRequests()

Whether chat model requests should be logged

Optional<Boolean>

logResponses()

Whether chat model responses should be logged

Optional<Integer>

maxNewTokens()

Int (0-250).

OptionalDouble

repetitionPenalty()

The parameter for repetition penalty. 1.0 means no penalty.

Boolean

returnFullText()

If set to false, the return results will not contain the original query making it easier for prompting

Double

temperature()

Float (0.0-100.0).

OptionalInt

topK()

The number of highest probability vocabulary tokens to keep for top-k-filtering.

OptionalDouble

topP()

If set to less than 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation.

Boolean

waitForModel()

If the model is not ready, wait for it instead of receiving 503.

Field Details
- DEFAULT_INFERENCE_ENDPOINT
  static final String DEFAULT_INFERENCE_ENDPOINT
  
  See Also:
  
  Constant Field Values
Method Details
- inferenceEndpointUrl
  
  @WithDefault("https://api-inference.huggingface.co/models/tiiuae/falcon-7b-instruct") URL inferenceEndpointUrl()
  
  The URL of the inference endpoint for the chat model.
  When using Hugging Face with the inference API, the URL is https://api-inference.huggingface.co/models/<model-id>, for example https://api-inference.huggingface.co/models/google/flan-t5-small.
  When using a deployed inference endpoint, the URL is the URL of the endpoint. When using a local hugging face model, the URL is the URL of the local model.
- temperature
  
  @WithDefault("${quarkus.langchain4j.temperature:1.0}") Double temperature()
  
  Float (0.0-100.0). The temperature of the sampling operation. 1 means regular sampling, 0 means always take the highest score, 100.0 is getting closer to uniform probability
- maxNewTokens
  
  Optional<Integer> maxNewTokens()
  
  Int (0-250). The amount of new tokens to be generated, this does not include the input length it is a estimate of the size of generated text you want. Each new tokens slows down the request, so look for balance between response times and length of text generated
- returnFullText
  
  @WithDefault("false") Boolean returnFullText()
  
  If set to false, the return results will not contain the original query making it easier for prompting
- waitForModel
  
  @WithDefault("true") Boolean waitForModel()
  
  If the model is not ready, wait for it instead of receiving 503. It limits the number of requests required to get your inference done. It is advised to only set this flag to true after receiving a 503 error as it will limit hanging in your application to known places
- doSample
  
  Optional<Boolean> doSample()
  
  Whether or not to use sampling ; use greedy decoding otherwise.
- topK
  
  OptionalInt topK()
  
  The number of highest probability vocabulary tokens to keep for top-k-filtering.
- topP
  
  OptionalDouble topP()
  
  If set to less than 1, only the most probable tokens with probabilities that add up to top_p or higher are kept for generation.
- repetitionPenalty
  
  OptionalDouble repetitionPenalty()
  
  The parameter for repetition penalty. 1.0 means no penalty. See this paper for more details.
- logRequests
  
  @ConfigDocDefault("false") Optional<Boolean> logRequests()
  
  Whether chat model requests should be logged
- logResponses
  
  @ConfigDocDefault("false") Optional<Boolean> logResponses()
  
  Whether chat model responses should be logged

Interface ChatModelConfig

Field Summary

Method Summary

Field Details

DEFAULT_INFERENCE_ENDPOINT

Method Details

inferenceEndpointUrl

temperature

maxNewTokens

returnFullText

waitForModel

doSample

topK

topP

repetitionPenalty

logRequests

logResponses