Class LlamaServiceSettings.Builder
java.lang.Object
co.elastic.clients.util.ObjectBuilderBase
co.elastic.clients.util.WithJsonObjectBuilderBase<LlamaServiceSettings.Builder>
co.elastic.clients.elasticsearch.inference.LlamaServiceSettings.Builder
- All Implemented Interfaces:
WithJson<LlamaServiceSettings.Builder>,ObjectBuilder<LlamaServiceSettings>
- Enclosing class:
- LlamaServiceSettings
public static class LlamaServiceSettings.Builder
extends WithJsonObjectBuilderBase<LlamaServiceSettings.Builder>
implements ObjectBuilder<LlamaServiceSettings>
Builder for
LlamaServiceSettings.-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionbuild()Builds aLlamaServiceSettings.maxInputTokens(Integer value) For atext_embeddingtask, the maximum number of tokens per input before chunking occurs.Required - The name of the model to use for the inference task.rateLimit(RateLimitSetting value) This setting helps to minimize the number of rate limit errors returned from the Llama API.This setting helps to minimize the number of rate limit errors returned from the Llama API.protected LlamaServiceSettings.Builderself()similarity(LlamaSimilarityType value) For atext_embeddingtask, the similarity measure.Required - The URL endpoint of the Llama stack endpoint.Methods inherited from class co.elastic.clients.util.WithJsonObjectBuilderBase
withJsonMethods inherited from class co.elastic.clients.util.ObjectBuilderBase
_checkSingleUse, _listAdd, _listAddAll, _mapPut, _mapPutAll
-
Constructor Details
-
Builder
public Builder()
-
-
Method Details
-
url
Required - The URL endpoint of the Llama stack endpoint. URL must contain:- For
text_embeddingtask -/v1/inference/embeddings. - For
completionandchat_completiontasks -/v1/openai/v1/chat/completions.
API name:
url - For
-
modelId
Required - The name of the model to use for the inference task. Refer to the Llama downloading models documentation for different ways of getting a list of available models and downloading them. Service has been tested and confirmed to be working with the following models:- For
text_embeddingtask -all-MiniLM-L6-v2. - For
completionandchat_completiontasks -llama3.2:3b.
API name:
model_id - For
-
maxInputTokens
For atext_embeddingtask, the maximum number of tokens per input before chunking occurs.API name:
max_input_tokens -
similarity
For atext_embeddingtask, the similarity measure. One of cosine, dot_product, l2_norm.API name:
similarity -
rateLimit
This setting helps to minimize the number of rate limit errors returned from the Llama API. By default, thellamaservice sets the number of requests allowed per minute to 3000.API name:
rate_limit -
rateLimit
public final LlamaServiceSettings.Builder rateLimit(Function<RateLimitSetting.Builder, ObjectBuilder<RateLimitSetting>> fn) This setting helps to minimize the number of rate limit errors returned from the Llama API. By default, thellamaservice sets the number of requests allowed per minute to 3000.API name:
rate_limit -
self
- Specified by:
selfin classWithJsonObjectBuilderBase<LlamaServiceSettings.Builder>
-
build
Builds aLlamaServiceSettings.- Specified by:
buildin interfaceObjectBuilder<LlamaServiceSettings>- Throws:
NullPointerException- if some of the required fields are null.
-