Class ElasticsearchInferenceAsyncClient
- All Implemented Interfaces:
Closeable
,AutoCloseable
-
Field Summary
Fields inherited from class co.elastic.clients.ApiClient
transport, transportOptions
-
Constructor Summary
ConstructorsConstructorDescriptionElasticsearchInferenceAsyncClient
(ElasticsearchTransport transport, TransportOptions transportOptions) -
Method Summary
Modifier and TypeMethodDescriptionPerform chat completion inferencefinal CompletableFuture<BinaryResponse>
chatCompletionUnified
(Function<ChatCompletionUnifiedRequest.Builder, ObjectBuilder<ChatCompletionUnifiedRequest>> fn) Perform chat completion inferencecompletion
(CompletionRequest request) Perform completion inference on the servicePerform completion inference on the servicedelete
(DeleteInferenceRequest request) Delete an inference endpointDelete an inference endpointget()
Get an inference endpointget
(GetInferenceRequest request) Get an inference endpointGet an inference endpointinference
(InferenceRequest request) Perform inference on the service.Perform inference on the service.put
(PutRequest request) Create an inference endpoint.final CompletableFuture<PutResponse>
Create an inference endpoint.putAlibabacloud
(PutAlibabacloudRequest request) Create an AlibabaCloud AI Search inference endpoint.Create an AlibabaCloud AI Search inference endpoint.putAmazonbedrock
(PutAmazonbedrockRequest request) Create an Amazon Bedrock inference endpoint.putAmazonbedrock
(Function<PutAmazonbedrockRequest.Builder, ObjectBuilder<PutAmazonbedrockRequest>> fn) Create an Amazon Bedrock inference endpoint.Create an Amazon SageMaker inference endpoint.putAmazonsagemaker
(Function<PutAmazonsagemakerRequest.Builder, ObjectBuilder<PutAmazonsagemakerRequest>> fn) Create an Amazon SageMaker inference endpoint.putAnthropic
(PutAnthropicRequest request) Create an Anthropic inference endpoint.Create an Anthropic inference endpoint.putAzureaistudio
(PutAzureaistudioRequest request) Create an Azure AI studio inference endpoint.putAzureaistudio
(Function<PutAzureaistudioRequest.Builder, ObjectBuilder<PutAzureaistudioRequest>> fn) Create an Azure AI studio inference endpoint.putAzureopenai
(PutAzureopenaiRequest request) Create an Azure OpenAI inference endpoint.Create an Azure OpenAI inference endpoint.putCohere
(PutCohereRequest request) Create a Cohere inference endpoint.Create a Cohere inference endpoint.putCustom
(PutCustomRequest request) Create a custom inference endpoint.Create a custom inference endpoint.putDeepseek
(PutDeepseekRequest request) Create a DeepSeek inference endpoint.Create a DeepSeek inference endpoint.putElasticsearch
(PutElasticsearchRequest request) Create an Elasticsearch inference endpoint.putElasticsearch
(Function<PutElasticsearchRequest.Builder, ObjectBuilder<PutElasticsearchRequest>> fn) Create an Elasticsearch inference endpoint.putElser
(PutElserRequest request) Create an ELSER inference endpoint.Create an ELSER inference endpoint.Create an Google AI Studio inference endpoint.putGoogleaistudio
(Function<PutGoogleaistudioRequest.Builder, ObjectBuilder<PutGoogleaistudioRequest>> fn) Create an Google AI Studio inference endpoint.Create a Google Vertex AI inference endpoint.putGooglevertexai
(Function<PutGooglevertexaiRequest.Builder, ObjectBuilder<PutGooglevertexaiRequest>> fn) Create a Google Vertex AI inference endpoint.putHuggingFace
(PutHuggingFaceRequest request) Create a Hugging Face inference endpoint.Create a Hugging Face inference endpoint.putJinaai
(PutJinaaiRequest request) Create an JinaAI inference endpoint.Create an JinaAI inference endpoint.putMistral
(PutMistralRequest request) Create a Mistral inference endpoint.Create a Mistral inference endpoint.putOpenai
(PutOpenaiRequest request) Create an OpenAI inference endpoint.Create an OpenAI inference endpoint.putVoyageai
(PutVoyageaiRequest request) Create a VoyageAI inference endpoint.Create a VoyageAI inference endpoint.putWatsonx
(PutWatsonxRequest request) Create a Watsonx inference endpoint.Create a Watsonx inference endpoint.rerank
(RerankRequest request) Perform reranking inference on the servicefinal CompletableFuture<RerankResponse>
Perform reranking inference on the servicesparseEmbedding
(SparseEmbeddingRequest request) Perform sparse embedding inference on the servicePerform sparse embedding inference on the servicestreamCompletion
(StreamCompletionRequest request) Perform streaming inference.final CompletableFuture<BinaryResponse>
streamCompletion
(Function<StreamCompletionRequest.Builder, ObjectBuilder<StreamCompletionRequest>> fn) Perform streaming inference.textEmbedding
(TextEmbeddingRequest request) Perform text embedding inference on the servicePerform text embedding inference on the serviceupdate
(UpdateInferenceRequest request) Update an inference endpoint.Update an inference endpoint.withTransportOptions
(TransportOptions transportOptions) Creates a new client with some request optionsMethods inherited from class co.elastic.clients.ApiClient
_jsonpMapper, _transport, _transportOptions, close, getDeserializer, withTransportOptions
-
Constructor Details
-
ElasticsearchInferenceAsyncClient
-
ElasticsearchInferenceAsyncClient
public ElasticsearchInferenceAsyncClient(ElasticsearchTransport transport, @Nullable TransportOptions transportOptions)
-
-
Method Details
-
withTransportOptions
public ElasticsearchInferenceAsyncClient withTransportOptions(@Nullable TransportOptions transportOptions) Description copied from class:ApiClient
Creates a new client with some request options- Specified by:
withTransportOptions
in classApiClient<ElasticsearchTransport,
ElasticsearchInferenceAsyncClient>
-
chatCompletionUnified
public CompletableFuture<BinaryResponse> chatCompletionUnified(ChatCompletionUnifiedRequest request) Perform chat completion inferenceThe chat completion inference API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation. It only works with the
chat_completion
task type foropenai
andelastic
inference services.NOTE: The
chat_completion
task type is only available within the _stream API and only supports streaming. The Chat completion inference API and the Stream inference API differ in their response structure and capabilities. The Chat completion inference API provides more comprehensive customization options through more fields and function calling support. If you use theopenai
,hugging_face
or theelastic
service, use the Chat completion inference API.- See Also:
-
chatCompletionUnified
public final CompletableFuture<BinaryResponse> chatCompletionUnified(Function<ChatCompletionUnifiedRequest.Builder, ObjectBuilder<ChatCompletionUnifiedRequest>> fn) Perform chat completion inferenceThe chat completion inference API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation. It only works with the
chat_completion
task type foropenai
andelastic
inference services.NOTE: The
chat_completion
task type is only available within the _stream API and only supports streaming. The Chat completion inference API and the Stream inference API differ in their response structure and capabilities. The Chat completion inference API provides more comprehensive customization options through more fields and function calling support. If you use theopenai
,hugging_face
or theelastic
service, use the Chat completion inference API.- Parameters:
fn
- a function that initializes a builder to create theChatCompletionUnifiedRequest
- See Also:
-
completion
Perform completion inference on the service- See Also:
-
completion
public final CompletableFuture<CompletionResponse> completion(Function<CompletionRequest.Builder, ObjectBuilder<CompletionRequest>> fn) Perform completion inference on the service- Parameters:
fn
- a function that initializes a builder to create theCompletionRequest
- See Also:
-
delete
Delete an inference endpoint- See Also:
-
delete
public final CompletableFuture<DeleteInferenceResponse> delete(Function<DeleteInferenceRequest.Builder, ObjectBuilder<DeleteInferenceRequest>> fn) Delete an inference endpoint- Parameters:
fn
- a function that initializes a builder to create theDeleteInferenceRequest
- See Also:
-
get
Get an inference endpoint- See Also:
-
get
public final CompletableFuture<GetInferenceResponse> get(Function<GetInferenceRequest.Builder, ObjectBuilder<GetInferenceRequest>> fn) Get an inference endpoint- Parameters:
fn
- a function that initializes a builder to create theGetInferenceRequest
- See Also:
-
get
Get an inference endpoint- See Also:
-
inference
Perform inference on the service.This API enables you to use machine learning models to perform specific tasks on data that you provide as an input. It returns a response with the results of the tasks. The inference endpoint you use can perform one specific task that has been defined when the endpoint was created with the create inference API.
For details about using this API with a service, such as Amazon Bedrock, Anthropic, or HuggingFace, refer to the service-specific documentation.
info The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.
- See Also:
-
inference
public final CompletableFuture<InferenceResponse> inference(Function<InferenceRequest.Builder, ObjectBuilder<InferenceRequest>> fn) Perform inference on the service.This API enables you to use machine learning models to perform specific tasks on data that you provide as an input. It returns a response with the results of the tasks. The inference endpoint you use can perform one specific task that has been defined when the endpoint was created with the create inference API.
For details about using this API with a service, such as Amazon Bedrock, Anthropic, or HuggingFace, refer to the service-specific documentation.
info The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.
- Parameters:
fn
- a function that initializes a builder to create theInferenceRequest
- See Also:
-
put
Create an inference endpoint.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Mistral, Azure OpenAI, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.
The following integrations are available through the inference API. You can find the available task types next to the integration name:
- AlibabaCloud AI Search (
completion
,rerank
,sparse_embedding
,text_embedding
) - Amazon Bedrock (
completion
,text_embedding
) - Amazon SageMaker (
chat_completion
,completion
,rerank
,sparse_embedding
,text_embedding
) - Anthropic (
completion
) - Azure AI Studio (
completion
,text_embedding
) - Azure OpenAI (
completion
,text_embedding
) - Cohere (
completion
,rerank
,text_embedding
) - DeepSeek (
chat_completion
,completion
) - Elasticsearch (
rerank
,sparse_embedding
,text_embedding
- this service is for built-in models and models uploaded through Eland) - ELSER (
sparse_embedding
) - Google AI Studio (
completion
,text_embedding
) - Google Vertex AI (
chat_completion
,completion
,rerank
,text_embedding
) - Hugging Face (
chat_completion
,completion
,rerank
,text_embedding
) - JinaAI (
rerank
,text_embedding
) - Llama (
chat_completion
,completion
,text_embedding
) - Mistral (
chat_completion
,completion
,text_embedding
) - OpenAI (
chat_completion
,completion
,text_embedding
) - VoyageAI (
rerank
,text_embedding
) - Watsonx inference integration (
text_embedding
)
- See Also:
- AlibabaCloud AI Search (
-
put
public final CompletableFuture<PutResponse> put(Function<PutRequest.Builder, ObjectBuilder<PutRequest>> fn) Create an inference endpoint.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Mistral, Azure OpenAI, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.
The following integrations are available through the inference API. You can find the available task types next to the integration name:
- AlibabaCloud AI Search (
completion
,rerank
,sparse_embedding
,text_embedding
) - Amazon Bedrock (
completion
,text_embedding
) - Amazon SageMaker (
chat_completion
,completion
,rerank
,sparse_embedding
,text_embedding
) - Anthropic (
completion
) - Azure AI Studio (
completion
,text_embedding
) - Azure OpenAI (
completion
,text_embedding
) - Cohere (
completion
,rerank
,text_embedding
) - DeepSeek (
chat_completion
,completion
) - Elasticsearch (
rerank
,sparse_embedding
,text_embedding
- this service is for built-in models and models uploaded through Eland) - ELSER (
sparse_embedding
) - Google AI Studio (
completion
,text_embedding
) - Google Vertex AI (
chat_completion
,completion
,rerank
,text_embedding
) - Hugging Face (
chat_completion
,completion
,rerank
,text_embedding
) - JinaAI (
rerank
,text_embedding
) - Llama (
chat_completion
,completion
,text_embedding
) - Mistral (
chat_completion
,completion
,text_embedding
) - OpenAI (
chat_completion
,completion
,text_embedding
) - VoyageAI (
rerank
,text_embedding
) - Watsonx inference integration (
text_embedding
)
- Parameters:
fn
- a function that initializes a builder to create thePutRequest
- See Also:
- AlibabaCloud AI Search (
-
putAlibabacloud
Create an AlibabaCloud AI Search inference endpoint.Create an inference endpoint to perform an inference task with the
alibabacloud-ai-search
service.- See Also:
-
putAlibabacloud
public final CompletableFuture<PutAlibabacloudResponse> putAlibabacloud(Function<PutAlibabacloudRequest.Builder, ObjectBuilder<PutAlibabacloudRequest>> fn) Create an AlibabaCloud AI Search inference endpoint.Create an inference endpoint to perform an inference task with the
alibabacloud-ai-search
service.- Parameters:
fn
- a function that initializes a builder to create thePutAlibabacloudRequest
- See Also:
-
putAmazonbedrock
public CompletableFuture<PutAmazonbedrockResponse> putAmazonbedrock(PutAmazonbedrockRequest request) Create an Amazon Bedrock inference endpoint.Create an inference endpoint to perform an inference task with the
amazonbedrock
service.info You need to provide the access and secret keys only once, during the inference model creation. The get inference API does not retrieve your access or secret keys. After creating the inference model, you cannot change the associated key pairs. If you want to use a different access and secret key pair, delete the inference model and recreate it with the same name and the updated keys.
- See Also:
-
putAmazonbedrock
public final CompletableFuture<PutAmazonbedrockResponse> putAmazonbedrock(Function<PutAmazonbedrockRequest.Builder, ObjectBuilder<PutAmazonbedrockRequest>> fn) Create an Amazon Bedrock inference endpoint.Create an inference endpoint to perform an inference task with the
amazonbedrock
service.info You need to provide the access and secret keys only once, during the inference model creation. The get inference API does not retrieve your access or secret keys. After creating the inference model, you cannot change the associated key pairs. If you want to use a different access and secret key pair, delete the inference model and recreate it with the same name and the updated keys.
- Parameters:
fn
- a function that initializes a builder to create thePutAmazonbedrockRequest
- See Also:
-
putAmazonsagemaker
public CompletableFuture<PutAmazonsagemakerResponse> putAmazonsagemaker(PutAmazonsagemakerRequest request) Create an Amazon SageMaker inference endpoint.Create an inference endpoint to perform an inference task with the
amazon_sagemaker
service.- See Also:
-
putAmazonsagemaker
public final CompletableFuture<PutAmazonsagemakerResponse> putAmazonsagemaker(Function<PutAmazonsagemakerRequest.Builder, ObjectBuilder<PutAmazonsagemakerRequest>> fn) Create an Amazon SageMaker inference endpoint.Create an inference endpoint to perform an inference task with the
amazon_sagemaker
service.- Parameters:
fn
- a function that initializes a builder to create thePutAmazonsagemakerRequest
- See Also:
-
putAnthropic
Create an Anthropic inference endpoint.Create an inference endpoint to perform an inference task with the
anthropic
service.- See Also:
-
putAnthropic
public final CompletableFuture<PutAnthropicResponse> putAnthropic(Function<PutAnthropicRequest.Builder, ObjectBuilder<PutAnthropicRequest>> fn) Create an Anthropic inference endpoint.Create an inference endpoint to perform an inference task with the
anthropic
service.- Parameters:
fn
- a function that initializes a builder to create thePutAnthropicRequest
- See Also:
-
putAzureaistudio
public CompletableFuture<PutAzureaistudioResponse> putAzureaistudio(PutAzureaistudioRequest request) Create an Azure AI studio inference endpoint.Create an inference endpoint to perform an inference task with the
azureaistudio
service.- See Also:
-
putAzureaistudio
public final CompletableFuture<PutAzureaistudioResponse> putAzureaistudio(Function<PutAzureaistudioRequest.Builder, ObjectBuilder<PutAzureaistudioRequest>> fn) Create an Azure AI studio inference endpoint.Create an inference endpoint to perform an inference task with the
azureaistudio
service.- Parameters:
fn
- a function that initializes a builder to create thePutAzureaistudioRequest
- See Also:
-
putAzureopenai
Create an Azure OpenAI inference endpoint.Create an inference endpoint to perform an inference task with the
azureopenai
service.The list of chat completion models that you can choose from in your Azure OpenAI deployment include:
The list of embeddings models that you can choose from in your deployment can be found in the Azure models documentation.
- See Also:
-
putAzureopenai
public final CompletableFuture<PutAzureopenaiResponse> putAzureopenai(Function<PutAzureopenaiRequest.Builder, ObjectBuilder<PutAzureopenaiRequest>> fn) Create an Azure OpenAI inference endpoint.Create an inference endpoint to perform an inference task with the
azureopenai
service.The list of chat completion models that you can choose from in your Azure OpenAI deployment include:
The list of embeddings models that you can choose from in your deployment can be found in the Azure models documentation.
- Parameters:
fn
- a function that initializes a builder to create thePutAzureopenaiRequest
- See Also:
-
putCohere
Create a Cohere inference endpoint.Create an inference endpoint to perform an inference task with the
cohere
service.- See Also:
-
putCohere
public final CompletableFuture<PutCohereResponse> putCohere(Function<PutCohereRequest.Builder, ObjectBuilder<PutCohereRequest>> fn) Create a Cohere inference endpoint.Create an inference endpoint to perform an inference task with the
cohere
service.- Parameters:
fn
- a function that initializes a builder to create thePutCohereRequest
- See Also:
-
putCustom
Create a custom inference endpoint.The custom service gives more control over how to interact with external inference services that aren't explicitly supported through dedicated integrations. The custom service gives you the ability to define the headers, url, query parameters, request body, and secrets. The custom service supports the template replacement functionality, which enables you to define a template that can be replaced with the value associated with that key. Templates are portions of a string that start with
${
and end with}
. The parameterssecret_parameters
andtask_settings
are checked for keys for template replacement. Template replacement is supported in therequest
,headers
,url
, andquery_parameters
. If the definition (key) is not found for a template, an error message is returned. In case of an endpoint definition like the following:PUT _inference/text_embedding/test-text-embedding { "service": "custom", "service_settings": { "secret_parameters": { "api_key": "<some api key>" }, "url": "...endpoints.huggingface.cloud/v1/embeddings", "headers": { "Authorization": "Bearer ${api_key}", "Content-Type": "application/json" }, "request": "{\"input\": ${input}}", "response": { "json_parser": { "text_embeddings":"$.data[*].embedding[*]" } } } }
To replace
${api_key}
thesecret_parameters
andtask_settings
are checked for a key namedapi_key
.info Templates should not be surrounded by quotes.
Pre-defined templates:
${input}
refers to the array of input strings that comes from theinput
field of the subsequent inference requests.${input_type}
refers to the input type translation values.${query}
refers to the query field used specifically for reranking tasks.${top_n}
refers to thetop_n
field available when performing rerank requests.${return_documents}
refers to thereturn_documents
field available when performing rerank requests.
- See Also:
-
putCustom
public final CompletableFuture<PutCustomResponse> putCustom(Function<PutCustomRequest.Builder, ObjectBuilder<PutCustomRequest>> fn) Create a custom inference endpoint.The custom service gives more control over how to interact with external inference services that aren't explicitly supported through dedicated integrations. The custom service gives you the ability to define the headers, url, query parameters, request body, and secrets. The custom service supports the template replacement functionality, which enables you to define a template that can be replaced with the value associated with that key. Templates are portions of a string that start with
${
and end with}
. The parameterssecret_parameters
andtask_settings
are checked for keys for template replacement. Template replacement is supported in therequest
,headers
,url
, andquery_parameters
. If the definition (key) is not found for a template, an error message is returned. In case of an endpoint definition like the following:PUT _inference/text_embedding/test-text-embedding { "service": "custom", "service_settings": { "secret_parameters": { "api_key": "<some api key>" }, "url": "...endpoints.huggingface.cloud/v1/embeddings", "headers": { "Authorization": "Bearer ${api_key}", "Content-Type": "application/json" }, "request": "{\"input\": ${input}}", "response": { "json_parser": { "text_embeddings":"$.data[*].embedding[*]" } } } }
To replace
${api_key}
thesecret_parameters
andtask_settings
are checked for a key namedapi_key
.info Templates should not be surrounded by quotes.
Pre-defined templates:
${input}
refers to the array of input strings that comes from theinput
field of the subsequent inference requests.${input_type}
refers to the input type translation values.${query}
refers to the query field used specifically for reranking tasks.${top_n}
refers to thetop_n
field available when performing rerank requests.${return_documents}
refers to thereturn_documents
field available when performing rerank requests.
- Parameters:
fn
- a function that initializes a builder to create thePutCustomRequest
- See Also:
-
putDeepseek
Create a DeepSeek inference endpoint.Create an inference endpoint to perform an inference task with the
deepseek
service.- See Also:
-
putDeepseek
public final CompletableFuture<PutDeepseekResponse> putDeepseek(Function<PutDeepseekRequest.Builder, ObjectBuilder<PutDeepseekRequest>> fn) Create a DeepSeek inference endpoint.Create an inference endpoint to perform an inference task with the
deepseek
service.- Parameters:
fn
- a function that initializes a builder to create thePutDeepseekRequest
- See Also:
-
putElasticsearch
public CompletableFuture<PutElasticsearchResponse> putElasticsearch(PutElasticsearchRequest request) Create an Elasticsearch inference endpoint.Create an inference endpoint to perform an inference task with the
elasticsearch
service.info Your Elasticsearch deployment contains preconfigured ELSER and E5 inference endpoints, you only need to create the enpoints using the API if you want to customize the settings.
If you use the ELSER or the E5 model through the
elasticsearch
service, the API request will automatically download and deploy the model if it isn't downloaded yet.info You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value.
After creating the endpoint, wait for the model deployment to complete before using it. To verify the deployment status, use the get trained model statistics API. Look for
"state": "fully_allocated"
in the response and ensure that the"allocation_count"
matches the"target_allocation_count"
. Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.- See Also:
-
putElasticsearch
public final CompletableFuture<PutElasticsearchResponse> putElasticsearch(Function<PutElasticsearchRequest.Builder, ObjectBuilder<PutElasticsearchRequest>> fn) Create an Elasticsearch inference endpoint.Create an inference endpoint to perform an inference task with the
elasticsearch
service.info Your Elasticsearch deployment contains preconfigured ELSER and E5 inference endpoints, you only need to create the enpoints using the API if you want to customize the settings.
If you use the ELSER or the E5 model through the
elasticsearch
service, the API request will automatically download and deploy the model if it isn't downloaded yet.info You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value.
After creating the endpoint, wait for the model deployment to complete before using it. To verify the deployment status, use the get trained model statistics API. Look for
"state": "fully_allocated"
in the response and ensure that the"allocation_count"
matches the"target_allocation_count"
. Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.- Parameters:
fn
- a function that initializes a builder to create thePutElasticsearchRequest
- See Also:
-
putElser
Create an ELSER inference endpoint.Create an inference endpoint to perform an inference task with the
elser
service. You can also deploy ELSER by using the Elasticsearch inference integration.info Your Elasticsearch deployment contains a preconfigured ELSER inference endpoint, you only need to create the enpoint using the API if you want to customize the settings.
The API request will automatically download and deploy the ELSER model if it isn't already downloaded.
info You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value.
After creating the endpoint, wait for the model deployment to complete before using it. To verify the deployment status, use the get trained model statistics API. Look for
"state": "fully_allocated"
in the response and ensure that the"allocation_count"
matches the"target_allocation_count"
. Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.- See Also:
-
putElser
public final CompletableFuture<PutElserResponse> putElser(Function<PutElserRequest.Builder, ObjectBuilder<PutElserRequest>> fn) Create an ELSER inference endpoint.Create an inference endpoint to perform an inference task with the
elser
service. You can also deploy ELSER by using the Elasticsearch inference integration.info Your Elasticsearch deployment contains a preconfigured ELSER inference endpoint, you only need to create the enpoint using the API if you want to customize the settings.
The API request will automatically download and deploy the ELSER model if it isn't already downloaded.
info You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value.
After creating the endpoint, wait for the model deployment to complete before using it. To verify the deployment status, use the get trained model statistics API. Look for
"state": "fully_allocated"
in the response and ensure that the"allocation_count"
matches the"target_allocation_count"
. Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.- Parameters:
fn
- a function that initializes a builder to create thePutElserRequest
- See Also:
-
putGoogleaistudio
public CompletableFuture<PutGoogleaistudioResponse> putGoogleaistudio(PutGoogleaistudioRequest request) Create an Google AI Studio inference endpoint.Create an inference endpoint to perform an inference task with the
googleaistudio
service.- See Also:
-
putGoogleaistudio
public final CompletableFuture<PutGoogleaistudioResponse> putGoogleaistudio(Function<PutGoogleaistudioRequest.Builder, ObjectBuilder<PutGoogleaistudioRequest>> fn) Create an Google AI Studio inference endpoint.Create an inference endpoint to perform an inference task with the
googleaistudio
service.- Parameters:
fn
- a function that initializes a builder to create thePutGoogleaistudioRequest
- See Also:
-
putGooglevertexai
public CompletableFuture<PutGooglevertexaiResponse> putGooglevertexai(PutGooglevertexaiRequest request) Create a Google Vertex AI inference endpoint.Create an inference endpoint to perform an inference task with the
googlevertexai
service.- See Also:
-
putGooglevertexai
public final CompletableFuture<PutGooglevertexaiResponse> putGooglevertexai(Function<PutGooglevertexaiRequest.Builder, ObjectBuilder<PutGooglevertexaiRequest>> fn) Create a Google Vertex AI inference endpoint.Create an inference endpoint to perform an inference task with the
googlevertexai
service.- Parameters:
fn
- a function that initializes a builder to create thePutGooglevertexaiRequest
- See Also:
-
putHuggingFace
Create a Hugging Face inference endpoint.Create an inference endpoint to perform an inference task with the
hugging_face
service. Supported tasks include:text_embedding
,completion
, andchat_completion
.To configure the endpoint, first visit the Hugging Face Inference Endpoints page and create a new endpoint. Select a model that supports the task you intend to use.
For Elastic's
text_embedding
task: The selected model must support theSentence Embeddings
task. On the new endpoint creation page, select theSentence Embeddings
task under theAdvanced Configuration
section. After the endpoint has initialized, copy the generated endpoint URL. Recommended models fortext_embedding
task:all-MiniLM-L6-v2
all-MiniLM-L12-v2
all-mpnet-base-v2
e5-base-v2
e5-small-v2
multilingual-e5-base
multilingual-e5-small
For Elastic's
chat_completion
andcompletion
tasks: The selected model must support theText Generation
task and expose OpenAI API. HuggingFace supports both serverless and dedicated endpoints forText Generation
. When creating dedicated endpoint select theText Generation
task. After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes/v1/chat/completions
part in URL. Then, copy the full endpoint URL for use. Recommended models forchat_completion
andcompletion
tasks:Mistral-7B-Instruct-v0.2
QwQ-32B
Phi-3-mini-128k-instruct
For Elastic's
rerank
task: The selected model must support thesentence-ranking
task and expose OpenAI API. HuggingFace supports only dedicated (not serverless) endpoints forRerank
so far. After the endpoint is initialized, copy the full endpoint URL for use. Tested models forrerank
task:bge-reranker-base
jina-reranker-v1-turbo-en-GGUF
- See Also:
-
putHuggingFace
public final CompletableFuture<PutHuggingFaceResponse> putHuggingFace(Function<PutHuggingFaceRequest.Builder, ObjectBuilder<PutHuggingFaceRequest>> fn) Create a Hugging Face inference endpoint.Create an inference endpoint to perform an inference task with the
hugging_face
service. Supported tasks include:text_embedding
,completion
, andchat_completion
.To configure the endpoint, first visit the Hugging Face Inference Endpoints page and create a new endpoint. Select a model that supports the task you intend to use.
For Elastic's
text_embedding
task: The selected model must support theSentence Embeddings
task. On the new endpoint creation page, select theSentence Embeddings
task under theAdvanced Configuration
section. After the endpoint has initialized, copy the generated endpoint URL. Recommended models fortext_embedding
task:all-MiniLM-L6-v2
all-MiniLM-L12-v2
all-mpnet-base-v2
e5-base-v2
e5-small-v2
multilingual-e5-base
multilingual-e5-small
For Elastic's
chat_completion
andcompletion
tasks: The selected model must support theText Generation
task and expose OpenAI API. HuggingFace supports both serverless and dedicated endpoints forText Generation
. When creating dedicated endpoint select theText Generation
task. After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes/v1/chat/completions
part in URL. Then, copy the full endpoint URL for use. Recommended models forchat_completion
andcompletion
tasks:Mistral-7B-Instruct-v0.2
QwQ-32B
Phi-3-mini-128k-instruct
For Elastic's
rerank
task: The selected model must support thesentence-ranking
task and expose OpenAI API. HuggingFace supports only dedicated (not serverless) endpoints forRerank
so far. After the endpoint is initialized, copy the full endpoint URL for use. Tested models forrerank
task:bge-reranker-base
jina-reranker-v1-turbo-en-GGUF
- Parameters:
fn
- a function that initializes a builder to create thePutHuggingFaceRequest
- See Also:
-
putJinaai
Create an JinaAI inference endpoint.Create an inference endpoint to perform an inference task with the
jinaai
service.To review the available
rerank
models, refer to https://jina.ai/reranker. To review the availabletext_embedding
models, refer to the https://jina.ai/embeddings/.- See Also:
-
putJinaai
public final CompletableFuture<PutJinaaiResponse> putJinaai(Function<PutJinaaiRequest.Builder, ObjectBuilder<PutJinaaiRequest>> fn) Create an JinaAI inference endpoint.Create an inference endpoint to perform an inference task with the
jinaai
service.To review the available
rerank
models, refer to https://jina.ai/reranker. To review the availabletext_embedding
models, refer to the https://jina.ai/embeddings/.- Parameters:
fn
- a function that initializes a builder to create thePutJinaaiRequest
- See Also:
-
putMistral
Create a Mistral inference endpoint.Create an inference endpoint to perform an inference task with the
mistral
service.- See Also:
-
putMistral
public final CompletableFuture<PutMistralResponse> putMistral(Function<PutMistralRequest.Builder, ObjectBuilder<PutMistralRequest>> fn) Create a Mistral inference endpoint.Create an inference endpoint to perform an inference task with the
mistral
service.- Parameters:
fn
- a function that initializes a builder to create thePutMistralRequest
- See Also:
-
putOpenai
Create an OpenAI inference endpoint.Create an inference endpoint to perform an inference task with the
openai
service oropenai
compatible APIs.- See Also:
-
putOpenai
public final CompletableFuture<PutOpenaiResponse> putOpenai(Function<PutOpenaiRequest.Builder, ObjectBuilder<PutOpenaiRequest>> fn) Create an OpenAI inference endpoint.Create an inference endpoint to perform an inference task with the
openai
service oropenai
compatible APIs.- Parameters:
fn
- a function that initializes a builder to create thePutOpenaiRequest
- See Also:
-
putVoyageai
Create a VoyageAI inference endpoint.Create an inference endpoint to perform an inference task with the
voyageai
service.Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.
- See Also:
-
putVoyageai
public final CompletableFuture<PutVoyageaiResponse> putVoyageai(Function<PutVoyageaiRequest.Builder, ObjectBuilder<PutVoyageaiRequest>> fn) Create a VoyageAI inference endpoint.Create an inference endpoint to perform an inference task with the
voyageai
service.Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.
- Parameters:
fn
- a function that initializes a builder to create thePutVoyageaiRequest
- See Also:
-
putWatsonx
Create a Watsonx inference endpoint.Create an inference endpoint to perform an inference task with the
watsonxai
service. You need an IBM Cloud Databases for Elasticsearch deployment to use thewatsonxai
inference service. You can provision one through the IBM catalog, the Cloud Databases CLI plug-in, the Cloud Databases API, or Terraform.- See Also:
-
putWatsonx
public final CompletableFuture<PutWatsonxResponse> putWatsonx(Function<PutWatsonxRequest.Builder, ObjectBuilder<PutWatsonxRequest>> fn) Create a Watsonx inference endpoint.Create an inference endpoint to perform an inference task with the
watsonxai
service. You need an IBM Cloud Databases for Elasticsearch deployment to use thewatsonxai
inference service. You can provision one through the IBM catalog, the Cloud Databases CLI plug-in, the Cloud Databases API, or Terraform.- Parameters:
fn
- a function that initializes a builder to create thePutWatsonxRequest
- See Also:
-
rerank
Perform reranking inference on the service- See Also:
-
rerank
public final CompletableFuture<RerankResponse> rerank(Function<RerankRequest.Builder, ObjectBuilder<RerankRequest>> fn) Perform reranking inference on the service- Parameters:
fn
- a function that initializes a builder to create theRerankRequest
- See Also:
-
sparseEmbedding
Perform sparse embedding inference on the service- See Also:
-
sparseEmbedding
public final CompletableFuture<SparseEmbeddingResponse> sparseEmbedding(Function<SparseEmbeddingRequest.Builder, ObjectBuilder<SparseEmbeddingRequest>> fn) Perform sparse embedding inference on the service- Parameters:
fn
- a function that initializes a builder to create theSparseEmbeddingRequest
- See Also:
-
streamCompletion
Perform streaming inference. Get real-time responses for completion tasks by delivering answers incrementally, reducing response times during computation. This API works only with the completion task type.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.
This API requires the
monitor_inference
cluster privilege (the built-ininference_admin
andinference_user
roles grant this privilege). You must use a client that supports streaming.- See Also:
-
streamCompletion
public final CompletableFuture<BinaryResponse> streamCompletion(Function<StreamCompletionRequest.Builder, ObjectBuilder<StreamCompletionRequest>> fn) Perform streaming inference. Get real-time responses for completion tasks by delivering answers incrementally, reducing response times during computation. This API works only with the completion task type.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.
This API requires the
monitor_inference
cluster privilege (the built-ininference_admin
andinference_user
roles grant this privilege). You must use a client that supports streaming.- Parameters:
fn
- a function that initializes a builder to create theStreamCompletionRequest
- See Also:
-
textEmbedding
Perform text embedding inference on the service- See Also:
-
textEmbedding
public final CompletableFuture<TextEmbeddingResponse> textEmbedding(Function<TextEmbeddingRequest.Builder, ObjectBuilder<TextEmbeddingRequest>> fn) Perform text embedding inference on the service- Parameters:
fn
- a function that initializes a builder to create theTextEmbeddingRequest
- See Also:
-
update
Update an inference endpoint.Modify
task_settings
, secrets (withinservice_settings
), ornum_allocations
for an inference endpoint, depending on the specific endpoint service andtask_type
.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.
- See Also:
-
update
public final CompletableFuture<UpdateInferenceResponse> update(Function<UpdateInferenceRequest.Builder, ObjectBuilder<UpdateInferenceRequest>> fn) Update an inference endpoint.Modify
task_settings
, secrets (withinservice_settings
), ornum_allocations
for an inference endpoint, depending on the specific endpoint service andtask_type
.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs.
- Parameters:
fn
- a function that initializes a builder to create theUpdateInferenceRequest
- See Also:
-