Class HuggingFaceServiceSettings
java.lang.Object
co.elastic.clients.elasticsearch.inference.HuggingFaceServiceSettings
- All Implemented Interfaces:
JsonpSerializable
@JsonpDeserializable
public class HuggingFaceServiceSettings
extends Object
implements JsonpSerializable
- See Also:
-
Nested Class Summary
Nested Classes -
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final JsonpDeserializer<HuggingFaceServiceSettings>
Json deserializer forHuggingFaceServiceSettings
-
Method Summary
Modifier and TypeMethodDescriptionfinal String
apiKey()
Required - A valid access token for your HuggingFace account.final String
modelId()
The name of the HuggingFace model to use for the inference task.static HuggingFaceServiceSettings
final RateLimitSetting
This setting helps to minimize the number of rate limit errors returned from Hugging Face.void
serialize
(jakarta.json.stream.JsonGenerator generator, JsonpMapper mapper) Serialize this object to JSON.protected void
serializeInternal
(jakarta.json.stream.JsonGenerator generator, JsonpMapper mapper) protected static void
setupHuggingFaceServiceSettingsDeserializer
(ObjectDeserializer<HuggingFaceServiceSettings.Builder> op) toString()
final String
url()
Required - The URL endpoint to use for the requests.
-
Field Details
-
_DESERIALIZER
Json deserializer forHuggingFaceServiceSettings
-
-
Method Details
-
of
public static HuggingFaceServiceSettings of(Function<HuggingFaceServiceSettings.Builder, ObjectBuilder<HuggingFaceServiceSettings>> fn) -
apiKey
Required - A valid access token for your HuggingFace account. You can create or find your access tokens on the HuggingFace settings page.IMPORTANT: You need to provide the API key only once, during the inference model creation. The get inference endpoint API does not retrieve your API key. After creating the inference model, you cannot change the associated API key. If you want to use a different API key, delete the inference model and recreate it with the same name and the updated API key.
API name:
api_key
-
rateLimit
This setting helps to minimize the number of rate limit errors returned from Hugging Face. By default, thehugging_face
service sets the number of requests allowed per minute to 3000 for all supported tasks. Hugging Face does not publish a universal rate limit — actual limits may vary. It is recommended to adjust this value based on the capacity and limits of your specific deployment environment.API name:
rate_limit
-
url
Required - The URL endpoint to use for the requests. Forcompletion
andchat_completion
tasks, the deployed model must be compatible with the Hugging Face Chat Completion interface (see the linked external documentation for details). The endpoint URL for the request must include/v1/chat/completions
. If the model supports the OpenAI Chat Completion schema, a toggle should appear in the interface. Enabling this toggle doesn't change any model behavior, it reveals the full endpoint URL needed (which should include/v1/chat/completions
) when configuring the inference endpoint in Elasticsearch. If the model doesn't support this schema, the toggle may not be shown.API name:
url
-
modelId
The name of the HuggingFace model to use for the inference task. Forcompletion
andchat_completion
tasks, this field is optional but may be required for certain models — particularly when using serverless inference endpoints. For thetext_embedding
task, this field should not be included. Otherwise, the request will fail.API name:
model_id
-
serialize
Serialize this object to JSON.- Specified by:
serialize
in interfaceJsonpSerializable
-
serializeInternal
-
toString
-
setupHuggingFaceServiceSettingsDeserializer
protected static void setupHuggingFaceServiceSettingsDeserializer(ObjectDeserializer<HuggingFaceServiceSettings.Builder> op)
-