Class PutHuggingFaceRequest

java.lang.Object
co.elastic.clients.elasticsearch._types.RequestBase
co.elastic.clients.elasticsearch.inference.PutHuggingFaceRequest
All Implemented Interfaces:
JsonpSerializable

@JsonpDeserializable public class PutHuggingFaceRequest extends RequestBase implements JsonpSerializable
Create a Hugging Face inference endpoint.

Create an inference endpoint to perform an inference task with the hugging_face service. Supported tasks include: text_embedding, completion, and chat_completion.

To configure the endpoint, first visit the Hugging Face Inference Endpoints page and create a new endpoint. Select a model that supports the task you intend to use.

For Elastic's text_embedding task: The selected model must support the Sentence Embeddings task. On the new endpoint creation page, select the Sentence Embeddings task under the Advanced Configuration section. After the endpoint has initialized, copy the generated endpoint URL. Recommended models for text_embedding task:

  • all-MiniLM-L6-v2
  • all-MiniLM-L12-v2
  • all-mpnet-base-v2
  • e5-base-v2
  • e5-small-v2
  • multilingual-e5-base
  • multilingual-e5-small

For Elastic's chat_completion and completion tasks: The selected model must support the Text Generation task and expose OpenAI API. HuggingFace supports both serverless and dedicated endpoints for Text Generation. When creating dedicated endpoint select the Text Generation task. After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes /v1/chat/completions part in URL. Then, copy the full endpoint URL for use. Recommended models for chat_completion and completion tasks:

  • Mistral-7B-Instruct-v0.2
  • QwQ-32B
  • Phi-3-mini-128k-instruct

For Elastic's rerank task: The selected model must support the sentence-ranking task and expose OpenAI API. HuggingFace supports only dedicated (not serverless) endpoints for Rerank so far. After the endpoint is initialized, copy the full endpoint URL for use. Tested models for rerank task:

  • bge-reranker-base
  • jina-reranker-v1-turbo-en-GGUF
See Also:
  • Field Details

  • Method Details

    • of

    • chunkingSettings

      @Nullable public final InferenceChunkingSettings chunkingSettings()
      The chunking configuration object. Applies only to the text_embedding task type. Not applicable to the rerank, completion, or chat_completion task types.

      API name: chunking_settings

    • huggingfaceInferenceId

      public final String huggingfaceInferenceId()
      Required - The unique identifier of the inference endpoint.

      API name: huggingface_inference_id

    • service

      public final HuggingFaceServiceType service()
      Required - The type of service supported for the specified task type. In this case, hugging_face.

      API name: service

    • serviceSettings

      public final HuggingFaceServiceSettings serviceSettings()
      Required - Settings used to install the inference model. These settings are specific to the hugging_face service.

      API name: service_settings

    • taskSettings

      @Nullable public final HuggingFaceTaskSettings taskSettings()
      Settings to configure the inference task. These settings are specific to the task type you specified.

      API name: task_settings

    • taskType

      public final HuggingFaceTaskType taskType()
      Required - The type of the inference task that the model will perform.

      API name: task_type

    • timeout

      @Nullable public final Time timeout()
      Specifies the amount of time to wait for the inference endpoint to be created.

      API name: timeout

    • serialize

      public void serialize(jakarta.json.stream.JsonGenerator generator, JsonpMapper mapper)
      Serialize this object to JSON.
      Specified by:
      serialize in interface JsonpSerializable
    • serializeInternal

      protected void serializeInternal(jakarta.json.stream.JsonGenerator generator, JsonpMapper mapper)
    • setupPutHuggingFaceRequestDeserializer

      protected static void setupPutHuggingFaceRequestDeserializer(ObjectDeserializer<PutHuggingFaceRequest.Builder> op)