Class LlamaServiceSettings
java.lang.Object
co.elastic.clients.elasticsearch.inference.LlamaServiceSettings
- All Implemented Interfaces:
JsonpSerializable
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final JsonpDeserializer<LlamaServiceSettings>Json deserializer forLlamaServiceSettings -
Method Summary
Modifier and TypeMethodDescriptionfinal IntegerFor atext_embeddingtask, the maximum number of tokens per input before chunking occurs.final StringmodelId()Required - The name of the model to use for the inference task.static LlamaServiceSettingsfinal RateLimitSettingThis setting helps to minimize the number of rate limit errors returned from the Llama API.voidserialize(jakarta.json.stream.JsonGenerator generator, JsonpMapper mapper) Serialize this object to JSON.protected voidserializeInternal(jakarta.json.stream.JsonGenerator generator, JsonpMapper mapper) protected static voidfinal LlamaSimilarityTypeFor atext_embeddingtask, the similarity measure.toString()final Stringurl()Required - The URL endpoint of the Llama stack endpoint.
-
Field Details
-
_DESERIALIZER
Json deserializer forLlamaServiceSettings
-
-
Method Details
-
of
public static LlamaServiceSettings of(Function<LlamaServiceSettings.Builder, ObjectBuilder<LlamaServiceSettings>> fn) -
url
Required - The URL endpoint of the Llama stack endpoint. URL must contain:- For
text_embeddingtask -/v1/inference/embeddings. - For
completionandchat_completiontasks -/v1/openai/v1/chat/completions.
API name:
url - For
-
modelId
Required - The name of the model to use for the inference task. Refer to the Llama downloading models documentation for different ways of getting a list of available models and downloading them. Service has been tested and confirmed to be working with the following models:- For
text_embeddingtask -all-MiniLM-L6-v2. - For
completionandchat_completiontasks -llama3.2:3b.
API name:
model_id - For
-
maxInputTokens
For atext_embeddingtask, the maximum number of tokens per input before chunking occurs.API name:
max_input_tokens -
similarity
For atext_embeddingtask, the similarity measure. One of cosine, dot_product, l2_norm.API name:
similarity -
rateLimit
This setting helps to minimize the number of rate limit errors returned from the Llama API. By default, thellamaservice sets the number of requests allowed per minute to 3000.API name:
rate_limit -
serialize
Serialize this object to JSON.- Specified by:
serializein interfaceJsonpSerializable
-
serializeInternal
-
toString
-
setupLlamaServiceSettingsDeserializer
protected static void setupLlamaServiceSettingsDeserializer(ObjectDeserializer<LlamaServiceSettings.Builder> op)
-