Class LlamaServiceSettings
java.lang.Object
co.elastic.clients.elasticsearch.inference.LlamaServiceSettings
- All Implemented Interfaces:
- JsonpSerializable
- See Also:
- 
Nested Class SummaryNested Classes
- 
Field SummaryFieldsModifier and TypeFieldDescriptionstatic final JsonpDeserializer<LlamaServiceSettings>Json deserializer forLlamaServiceSettings
- 
Method SummaryModifier and TypeMethodDescriptionfinal IntegerFor atext_embeddingtask, the maximum number of tokens per input before chunking occurs.final StringmodelId()Required - The name of the model to use for the inference task.static LlamaServiceSettingsfinal RateLimitSettingThis setting helps to minimize the number of rate limit errors returned from the Llama API.voidserialize(jakarta.json.stream.JsonGenerator generator, JsonpMapper mapper) Serialize this object to JSON.protected voidserializeInternal(jakarta.json.stream.JsonGenerator generator, JsonpMapper mapper) protected static voidfinal LlamaSimilarityTypeFor atext_embeddingtask, the similarity measure.toString()final Stringurl()Required - The URL endpoint of the Llama stack endpoint.
- 
Field Details- 
_DESERIALIZERJson deserializer forLlamaServiceSettings
 
- 
- 
Method Details- 
ofpublic static LlamaServiceSettings of(Function<LlamaServiceSettings.Builder, ObjectBuilder<LlamaServiceSettings>> fn) 
- 
urlRequired - The URL endpoint of the Llama stack endpoint. URL must contain:- For text_embeddingtask -/v1/inference/embeddings.
- For completionandchat_completiontasks -/v1/openai/v1/chat/completions.
 API name: url
- For 
- 
modelIdRequired - The name of the model to use for the inference task. Refer to the Llama downloading models documentation for different ways of getting a list of available models and downloading them. Service has been tested and confirmed to be working with the following models:- For text_embeddingtask -all-MiniLM-L6-v2.
- For completionandchat_completiontasks -llama3.2:3b.
 API name: model_id
- For 
- 
maxInputTokensFor atext_embeddingtask, the maximum number of tokens per input before chunking occurs.API name: max_input_tokens
- 
similarityFor atext_embeddingtask, the similarity measure. One of cosine, dot_product, l2_norm.API name: similarity
- 
rateLimitThis setting helps to minimize the number of rate limit errors returned from the Llama API. By default, thellamaservice sets the number of requests allowed per minute to 3000.API name: rate_limit
- 
serializeSerialize this object to JSON.- Specified by:
- serializein interface- JsonpSerializable
 
- 
serializeInternal
- 
toString
- 
setupLlamaServiceSettingsDeserializerprotected static void setupLlamaServiceSettingsDeserializer(ObjectDeserializer<LlamaServiceSettings.Builder> op) 
 
-