public interface ChatModelConfig

Method Summary

Modifier and Type

Method

Description

Double

frequencyPenalty()

Number between -2.0 and 2.0.

Optional<Boolean>

logRequests()

Whether chat model requests should be logged

Optional<Boolean>

logResponses()

Whether chat model responses should be logged

Optional<Integer>

maxCompletionTokens()

An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.

Optional<Integer>

maxTokens()

Deprecated.
For newer OpenAI models, use maxCompletionTokens instead

String

modelName()

Model name to use

Double

presencePenalty()

Number between -2.0 and 2.0.

Optional<String>

reasoningEffort()

Constrains effort on reasoning for reasoning models.

Optional<String>

responseFormat()

The response format the model should use.

Optional<String>

serviceTier()

Specifies the processing type used for serving the request.

Optional<List<String>>

stop()

The list of stop words to use.

Optional<Boolean>

strictJsonSchema()

Whether responses follow JSON Schema for Structured Outputs

Double

temperature()

What sampling temperature to use, with values between 0 and 2.

Double

topP()

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with topP probability mass. 0.1 means only the tokens comprising the top 10% probability mass are considered.

Method Details
- modelName
  
  @WithDefault("gpt-4o-mini") String modelName()
  
  Model name to use
- temperature
  
  @WithDefault("${quarkus.langchain4j.temperature:1.0}") Double temperature()
  
  What sampling temperature to use, with values between 0 and 2. Higher values means the model will take more risks. A value of 0.9 is good for more creative applications, while 0 (argmax sampling) is good for ones with a well-defined answer. It is recommended to alter this or topP, but not both.
- topP
  
  @WithDefault("1.0") Double topP()
  
  An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with topP probability mass. 0.1 means only the tokens comprising the top 10% probability mass are considered. It is recommended to alter this or temperature, but not both.
- maxTokens
  
  @Deprecated Optional<Integer> maxTokens()
  
  Deprecated.
  For newer OpenAI models, use maxCompletionTokens instead
  
  The maximum number of tokens to generate in the completion. The token count of your prompt plus max_tokens can't exceed the model's context length. Most models have a context length of 2048 tokens (except for the newest models, which support 4096).
- maxCompletionTokens
  
  Optional<Integer> maxCompletionTokens()
  
  An upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens.
- presencePenalty
  
  @WithDefault("0") Double presencePenalty()
  
  Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
- frequencyPenalty
  
  @WithDefault("0") Double frequencyPenalty()
  
  Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
- logRequests
  
  @ConfigDocDefault("false") Optional<Boolean> logRequests()
  
  Whether chat model requests should be logged
- logResponses
  
  @ConfigDocDefault("false") Optional<Boolean> logResponses()
  
  Whether chat model responses should be logged
- responseFormat
  
  Optional<String> responseFormat()
  
  The response format the model should use. Some models are not compatible with some response formats, make sure to review OpenAI documentation.
- strictJsonSchema
  
  Optional<Boolean> strictJsonSchema()
  
  Whether responses follow JSON Schema for Structured Outputs
- stop
  
  Optional<List<String>> stop()
  
  The list of stop words to use.
  
  Returns:
- reasoningEffort
  
  Optional<String> reasoningEffort()
  
  Constrains effort on reasoning for reasoning models. Currently supported values are minimal, low, medium, and high. Reducing reasoning effort can result in faster responses and fewer tokens used on reasoning in a response.
  Note: The gpt-5-pro model defaults to (and only supports) high reasoning effort.
- serviceTier
  
  @ConfigDocDefault("default") Optional<String> serviceTier()
  
  Specifies the processing type used for serving the request.
  If set to auto, then the request will be processed with the service tier configured in the Project settings. If set to default, then the request will be processed with the standard pricing and performance for the selected model. If set to flex or priority, then the request will be processed with the corresponding service tier. When not set, the default behavior is auto.
  When the service tier parameter is set, the response body will include the service_tier value based on the processing mode actually used to serve the request. This response value may be different from the value set in the parameter.

Interface ChatModelConfig

Method Summary

Method Details

modelName

temperature

topP

maxTokens

maxCompletionTokens

presencePenalty

frequencyPenalty

logRequests

logResponses

responseFormat

strictJsonSchema

stop

reasoningEffort

serviceTier