Class ElasticsearchInferenceAsyncClient
- All Implemented Interfaces:
- Closeable,- AutoCloseable
- 
Field SummaryFields inherited from class co.elastic.clients.ApiClienttransport, transportOptions
- 
Constructor SummaryConstructorsConstructorDescriptionElasticsearchInferenceAsyncClient(ElasticsearchTransport transport, TransportOptions transportOptions) 
- 
Method SummaryModifier and TypeMethodDescriptionPerform chat completion inferencefinal CompletableFuture<BinaryResponse>chatCompletionUnified(Function<ChatCompletionUnifiedRequest.Builder, ObjectBuilder<ChatCompletionUnifiedRequest>> fn) Perform chat completion inferencecompletion(CompletionRequest request) Perform completion inference on the servicePerform completion inference on the servicedelete(DeleteInferenceRequest request) Delete an inference endpointDelete an inference endpointget()Get an inference endpointget(GetInferenceRequest request) Get an inference endpointGet an inference endpointinference(InferenceRequest request) Perform inference on the service.Perform inference on the service.put(PutRequest request) Create an inference endpoint.final CompletableFuture<PutResponse>Create an inference endpoint.putAi21(PutAi21Request request) Create a AI21 inference endpoint.final CompletableFuture<PutAi21Response>Create a AI21 inference endpoint.putAlibabacloud(PutAlibabacloudRequest request) Create an AlibabaCloud AI Search inference endpoint.Create an AlibabaCloud AI Search inference endpoint.putAmazonbedrock(PutAmazonbedrockRequest request) Create an Amazon Bedrock inference endpoint.putAmazonbedrock(Function<PutAmazonbedrockRequest.Builder, ObjectBuilder<PutAmazonbedrockRequest>> fn) Create an Amazon Bedrock inference endpoint.Create an Amazon SageMaker inference endpoint.putAmazonsagemaker(Function<PutAmazonsagemakerRequest.Builder, ObjectBuilder<PutAmazonsagemakerRequest>> fn) Create an Amazon SageMaker inference endpoint.putAnthropic(PutAnthropicRequest request) Create an Anthropic inference endpoint.Create an Anthropic inference endpoint.putAzureaistudio(PutAzureaistudioRequest request) Create an Azure AI studio inference endpoint.putAzureaistudio(Function<PutAzureaistudioRequest.Builder, ObjectBuilder<PutAzureaistudioRequest>> fn) Create an Azure AI studio inference endpoint.putAzureopenai(PutAzureopenaiRequest request) Create an Azure OpenAI inference endpoint.Create an Azure OpenAI inference endpoint.putCohere(PutCohereRequest request) Create a Cohere inference endpoint.Create a Cohere inference endpoint.putContextualai(PutContextualaiRequest request) Create an Contextual AI inference endpoint.Create an Contextual AI inference endpoint.putCustom(PutCustomRequest request) Create a custom inference endpoint.Create a custom inference endpoint.putDeepseek(PutDeepseekRequest request) Create a DeepSeek inference endpoint.Create a DeepSeek inference endpoint.putElasticsearch(PutElasticsearchRequest request) Create an Elasticsearch inference endpoint.putElasticsearch(Function<PutElasticsearchRequest.Builder, ObjectBuilder<PutElasticsearchRequest>> fn) Create an Elasticsearch inference endpoint.putElser(PutElserRequest request) Create an ELSER inference endpoint.Create an ELSER inference endpoint.Create an Google AI Studio inference endpoint.putGoogleaistudio(Function<PutGoogleaistudioRequest.Builder, ObjectBuilder<PutGoogleaistudioRequest>> fn) Create an Google AI Studio inference endpoint.Create a Google Vertex AI inference endpoint.putGooglevertexai(Function<PutGooglevertexaiRequest.Builder, ObjectBuilder<PutGooglevertexaiRequest>> fn) Create a Google Vertex AI inference endpoint.putHuggingFace(PutHuggingFaceRequest request) Create a Hugging Face inference endpoint.Create a Hugging Face inference endpoint.putJinaai(PutJinaaiRequest request) Create an JinaAI inference endpoint.Create an JinaAI inference endpoint.putLlama(PutLlamaRequest request) Create a Llama inference endpoint.Create a Llama inference endpoint.putMistral(PutMistralRequest request) Create a Mistral inference endpoint.Create a Mistral inference endpoint.putOpenai(PutOpenaiRequest request) Create an OpenAI inference endpoint.Create an OpenAI inference endpoint.putVoyageai(PutVoyageaiRequest request) Create a VoyageAI inference endpoint.Create a VoyageAI inference endpoint.putWatsonx(PutWatsonxRequest request) Create a Watsonx inference endpoint.Create a Watsonx inference endpoint.rerank(RerankRequest request) Perform reranking inference on the servicefinal CompletableFuture<RerankResponse>Perform reranking inference on the servicesparseEmbedding(SparseEmbeddingRequest request) Perform sparse embedding inference on the servicePerform sparse embedding inference on the servicestreamCompletion(StreamCompletionRequest request) Perform streaming inference.final CompletableFuture<BinaryResponse>streamCompletion(Function<StreamCompletionRequest.Builder, ObjectBuilder<StreamCompletionRequest>> fn) Perform streaming inference.textEmbedding(TextEmbeddingRequest request) Perform text embedding inference on the servicePerform text embedding inference on the serviceupdate(UpdateInferenceRequest request) Update an inference endpoint.Update an inference endpoint.withTransportOptions(TransportOptions transportOptions) Creates a new client with some request optionsMethods inherited from class co.elastic.clients.ApiClient_jsonpMapper, _transport, _transportOptions, close, getDeserializer, withTransportOptions
- 
Constructor Details- 
ElasticsearchInferenceAsyncClient
- 
ElasticsearchInferenceAsyncClientpublic ElasticsearchInferenceAsyncClient(ElasticsearchTransport transport, @Nullable TransportOptions transportOptions) 
 
- 
- 
Method Details- 
withTransportOptionspublic ElasticsearchInferenceAsyncClient withTransportOptions(@Nullable TransportOptions transportOptions) Description copied from class:ApiClientCreates a new client with some request options- Specified by:
- withTransportOptionsin class- ApiClient<ElasticsearchTransport,- ElasticsearchInferenceAsyncClient> 
 
- 
chatCompletionUnifiedpublic CompletableFuture<BinaryResponse> chatCompletionUnified(ChatCompletionUnifiedRequest request) Perform chat completion inferenceThe chat completion inference API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation. It only works with the chat_completiontask type foropenaiandelasticinference services.NOTE: The chat_completiontask type is only available within the _stream API and only supports streaming. The Chat completion inference API and the Stream inference API differ in their response structure and capabilities. The Chat completion inference API provides more comprehensive customization options through more fields and function calling support. If you use theopenai,hugging_faceor theelasticservice, use the Chat completion inference API.- See Also:
 
- 
chatCompletionUnifiedpublic final CompletableFuture<BinaryResponse> chatCompletionUnified(Function<ChatCompletionUnifiedRequest.Builder, ObjectBuilder<ChatCompletionUnifiedRequest>> fn) Perform chat completion inferenceThe chat completion inference API enables real-time responses for chat completion tasks by delivering answers incrementally, reducing response times during computation. It only works with the chat_completiontask type foropenaiandelasticinference services.NOTE: The chat_completiontask type is only available within the _stream API and only supports streaming. The Chat completion inference API and the Stream inference API differ in their response structure and capabilities. The Chat completion inference API provides more comprehensive customization options through more fields and function calling support. If you use theopenai,hugging_faceor theelasticservice, use the Chat completion inference API.- Parameters:
- fn- a function that initializes a builder to create the- ChatCompletionUnifiedRequest
- See Also:
 
- 
completionPerform completion inference on the service- See Also:
 
- 
completionpublic final CompletableFuture<CompletionResponse> completion(Function<CompletionRequest.Builder, ObjectBuilder<CompletionRequest>> fn) Perform completion inference on the service- Parameters:
- fn- a function that initializes a builder to create the- CompletionRequest
- See Also:
 
- 
deleteDelete an inference endpoint- See Also:
 
- 
deletepublic final CompletableFuture<DeleteInferenceResponse> delete(Function<DeleteInferenceRequest.Builder, ObjectBuilder<DeleteInferenceRequest>> fn) Delete an inference endpoint- Parameters:
- fn- a function that initializes a builder to create the- DeleteInferenceRequest
- See Also:
 
- 
getGet an inference endpoint- See Also:
 
- 
getpublic final CompletableFuture<GetInferenceResponse> get(Function<GetInferenceRequest.Builder, ObjectBuilder<GetInferenceRequest>> fn) Get an inference endpoint- Parameters:
- fn- a function that initializes a builder to create the- GetInferenceRequest
- See Also:
 
- 
getGet an inference endpoint- See Also:
 
- 
inferencePerform inference on the service.This API enables you to use machine learning models to perform specific tasks on data that you provide as an input. It returns a response with the results of the tasks. The inference endpoint you use can perform one specific task that has been defined when the endpoint was created with the create inference API. For details about using this API with a service, such as Amazon Bedrock, Anthropic, or HuggingFace, refer to the service-specific documentation. info The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs. - See Also:
 
- 
inferencepublic final CompletableFuture<InferenceResponse> inference(Function<InferenceRequest.Builder, ObjectBuilder<InferenceRequest>> fn) Perform inference on the service.This API enables you to use machine learning models to perform specific tasks on data that you provide as an input. It returns a response with the results of the tasks. The inference endpoint you use can perform one specific task that has been defined when the endpoint was created with the create inference API. For details about using this API with a service, such as Amazon Bedrock, Anthropic, or HuggingFace, refer to the service-specific documentation. info The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs. - Parameters:
- fn- a function that initializes a builder to create the- InferenceRequest
- See Also:
 
- 
putCreate an inference endpoint.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Mistral, Azure OpenAI, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs. The following integrations are available through the inference API. You can find the available task types next to the integration name: - AI21 (chat_completion,completion)
- AlibabaCloud AI Search (completion,rerank,sparse_embedding,text_embedding)
- Amazon Bedrock (completion,text_embedding)
- Amazon SageMaker (chat_completion,completion,rerank,sparse_embedding,text_embedding)
- Anthropic (completion)
- Azure AI Studio (completion, 'rerank',text_embedding)
- Azure OpenAI (completion,text_embedding)
- Cohere (completion,rerank,text_embedding)
- DeepSeek (chat_completion,completion)
- Elasticsearch (rerank,sparse_embedding,text_embedding- this service is for built-in models and models uploaded through Eland)
- ELSER (sparse_embedding)
- Google AI Studio (completion,text_embedding)
- Google Vertex AI (chat_completion,completion,rerank,text_embedding)
- Hugging Face (chat_completion,completion,rerank,text_embedding)
- JinaAI (rerank,text_embedding)
- Llama (chat_completion,completion,text_embedding)
- Mistral (chat_completion,completion,text_embedding)
- OpenAI (chat_completion,completion,text_embedding)
- VoyageAI (rerank,text_embedding)
- Watsonx inference integration (text_embedding)
 - See Also:
 
- AI21 (
- 
putpublic final CompletableFuture<PutResponse> put(Function<PutRequest.Builder, ObjectBuilder<PutRequest>> fn) Create an inference endpoint.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Mistral, Azure OpenAI, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs. The following integrations are available through the inference API. You can find the available task types next to the integration name: - AI21 (chat_completion,completion)
- AlibabaCloud AI Search (completion,rerank,sparse_embedding,text_embedding)
- Amazon Bedrock (completion,text_embedding)
- Amazon SageMaker (chat_completion,completion,rerank,sparse_embedding,text_embedding)
- Anthropic (completion)
- Azure AI Studio (completion, 'rerank',text_embedding)
- Azure OpenAI (completion,text_embedding)
- Cohere (completion,rerank,text_embedding)
- DeepSeek (chat_completion,completion)
- Elasticsearch (rerank,sparse_embedding,text_embedding- this service is for built-in models and models uploaded through Eland)
- ELSER (sparse_embedding)
- Google AI Studio (completion,text_embedding)
- Google Vertex AI (chat_completion,completion,rerank,text_embedding)
- Hugging Face (chat_completion,completion,rerank,text_embedding)
- JinaAI (rerank,text_embedding)
- Llama (chat_completion,completion,text_embedding)
- Mistral (chat_completion,completion,text_embedding)
- OpenAI (chat_completion,completion,text_embedding)
- VoyageAI (rerank,text_embedding)
- Watsonx inference integration (text_embedding)
 - Parameters:
- fn- a function that initializes a builder to create the- PutRequest
- See Also:
 
- AI21 (
- 
putAi21Create a AI21 inference endpoint.Create an inference endpoint to perform an inference task with the ai21service.- See Also:
 
- 
putAi21public final CompletableFuture<PutAi21Response> putAi21(Function<PutAi21Request.Builder, ObjectBuilder<PutAi21Request>> fn) Create a AI21 inference endpoint.Create an inference endpoint to perform an inference task with the ai21service.- Parameters:
- fn- a function that initializes a builder to create the- PutAi21Request
- See Also:
 
- 
putAlibabacloudCreate an AlibabaCloud AI Search inference endpoint.Create an inference endpoint to perform an inference task with the alibabacloud-ai-searchservice.- See Also:
 
- 
putAlibabacloudpublic final CompletableFuture<PutAlibabacloudResponse> putAlibabacloud(Function<PutAlibabacloudRequest.Builder, ObjectBuilder<PutAlibabacloudRequest>> fn) Create an AlibabaCloud AI Search inference endpoint.Create an inference endpoint to perform an inference task with the alibabacloud-ai-searchservice.- Parameters:
- fn- a function that initializes a builder to create the- PutAlibabacloudRequest
- See Also:
 
- 
putAmazonbedrockpublic CompletableFuture<PutAmazonbedrockResponse> putAmazonbedrock(PutAmazonbedrockRequest request) Create an Amazon Bedrock inference endpoint.Create an inference endpoint to perform an inference task with the amazonbedrockservice.info You need to provide the access and secret keys only once, during the inference model creation. The get inference API does not retrieve your access or secret keys. After creating the inference model, you cannot change the associated key pairs. If you want to use a different access and secret key pair, delete the inference model and recreate it with the same name and the updated keys. - See Also:
 
- 
putAmazonbedrockpublic final CompletableFuture<PutAmazonbedrockResponse> putAmazonbedrock(Function<PutAmazonbedrockRequest.Builder, ObjectBuilder<PutAmazonbedrockRequest>> fn) Create an Amazon Bedrock inference endpoint.Create an inference endpoint to perform an inference task with the amazonbedrockservice.info You need to provide the access and secret keys only once, during the inference model creation. The get inference API does not retrieve your access or secret keys. After creating the inference model, you cannot change the associated key pairs. If you want to use a different access and secret key pair, delete the inference model and recreate it with the same name and the updated keys. - Parameters:
- fn- a function that initializes a builder to create the- PutAmazonbedrockRequest
- See Also:
 
- 
putAmazonsagemakerpublic CompletableFuture<PutAmazonsagemakerResponse> putAmazonsagemaker(PutAmazonsagemakerRequest request) Create an Amazon SageMaker inference endpoint.Create an inference endpoint to perform an inference task with the amazon_sagemakerservice.- See Also:
 
- 
putAmazonsagemakerpublic final CompletableFuture<PutAmazonsagemakerResponse> putAmazonsagemaker(Function<PutAmazonsagemakerRequest.Builder, ObjectBuilder<PutAmazonsagemakerRequest>> fn) Create an Amazon SageMaker inference endpoint.Create an inference endpoint to perform an inference task with the amazon_sagemakerservice.- Parameters:
- fn- a function that initializes a builder to create the- PutAmazonsagemakerRequest
- See Also:
 
- 
putAnthropicCreate an Anthropic inference endpoint.Create an inference endpoint to perform an inference task with the anthropicservice.- See Also:
 
- 
putAnthropicpublic final CompletableFuture<PutAnthropicResponse> putAnthropic(Function<PutAnthropicRequest.Builder, ObjectBuilder<PutAnthropicRequest>> fn) Create an Anthropic inference endpoint.Create an inference endpoint to perform an inference task with the anthropicservice.- Parameters:
- fn- a function that initializes a builder to create the- PutAnthropicRequest
- See Also:
 
- 
putAzureaistudiopublic CompletableFuture<PutAzureaistudioResponse> putAzureaistudio(PutAzureaistudioRequest request) Create an Azure AI studio inference endpoint.Create an inference endpoint to perform an inference task with the azureaistudioservice.- See Also:
 
- 
putAzureaistudiopublic final CompletableFuture<PutAzureaistudioResponse> putAzureaistudio(Function<PutAzureaistudioRequest.Builder, ObjectBuilder<PutAzureaistudioRequest>> fn) Create an Azure AI studio inference endpoint.Create an inference endpoint to perform an inference task with the azureaistudioservice.- Parameters:
- fn- a function that initializes a builder to create the- PutAzureaistudioRequest
- See Also:
 
- 
putAzureopenaiCreate an Azure OpenAI inference endpoint.Create an inference endpoint to perform an inference task with the azureopenaiservice.The list of chat completion models that you can choose from in your Azure OpenAI deployment include: The list of embeddings models that you can choose from in your deployment can be found in the Azure models documentation. - See Also:
 
- 
putAzureopenaipublic final CompletableFuture<PutAzureopenaiResponse> putAzureopenai(Function<PutAzureopenaiRequest.Builder, ObjectBuilder<PutAzureopenaiRequest>> fn) Create an Azure OpenAI inference endpoint.Create an inference endpoint to perform an inference task with the azureopenaiservice.The list of chat completion models that you can choose from in your Azure OpenAI deployment include: The list of embeddings models that you can choose from in your deployment can be found in the Azure models documentation. - Parameters:
- fn- a function that initializes a builder to create the- PutAzureopenaiRequest
- See Also:
 
- 
putCohereCreate a Cohere inference endpoint.Create an inference endpoint to perform an inference task with the cohereservice.- See Also:
 
- 
putCoherepublic final CompletableFuture<PutCohereResponse> putCohere(Function<PutCohereRequest.Builder, ObjectBuilder<PutCohereRequest>> fn) Create a Cohere inference endpoint.Create an inference endpoint to perform an inference task with the cohereservice.- Parameters:
- fn- a function that initializes a builder to create the- PutCohereRequest
- See Also:
 
- 
putContextualaiCreate an Contextual AI inference endpoint.Create an inference endpoint to perform an inference task with the contexualaiservice.To review the available rerankmodels, refer to https://docs.contextual.ai/api-reference/rerank/rerank#body-model.- See Also:
 
- 
putContextualaipublic final CompletableFuture<PutContextualaiResponse> putContextualai(Function<PutContextualaiRequest.Builder, ObjectBuilder<PutContextualaiRequest>> fn) Create an Contextual AI inference endpoint.Create an inference endpoint to perform an inference task with the contexualaiservice.To review the available rerankmodels, refer to https://docs.contextual.ai/api-reference/rerank/rerank#body-model.- Parameters:
- fn- a function that initializes a builder to create the- PutContextualaiRequest
- See Also:
 
- 
putCustomCreate a custom inference endpoint.The custom service gives more control over how to interact with external inference services that aren't explicitly supported through dedicated integrations. The custom service gives you the ability to define the headers, url, query parameters, request body, and secrets. The custom service supports the template replacement functionality, which enables you to define a template that can be replaced with the value associated with that key. Templates are portions of a string that start with ${and end with}. The parameterssecret_parametersandtask_settingsare checked for keys for template replacement. Template replacement is supported in therequest,headers,url, andquery_parameters. If the definition (key) is not found for a template, an error message is returned. In case of an endpoint definition like the following:PUT _inference/text_embedding/test-text-embedding { "service": "custom", "service_settings": { "secret_parameters": { "api_key": "<some api key>" }, "url": "...endpoints.huggingface.cloud/v1/embeddings", "headers": { "Authorization": "Bearer ${api_key}", "Content-Type": "application/json" }, "request": "{\"input\": ${input}}", "response": { "json_parser": { "text_embeddings":"$.data[*].embedding[*]" } } } }To replace ${api_key}thesecret_parametersandtask_settingsare checked for a key namedapi_key.info Templates should not be surrounded by quotes. Pre-defined templates: - ${input}refers to the array of input strings that comes from the- inputfield of the subsequent inference requests.
- ${input_type}refers to the input type translation values.
- ${query}refers to the query field used specifically for reranking tasks.
- ${top_n}refers to the- top_nfield available when performing rerank requests.
- ${return_documents}refers to the- return_documentsfield available when performing rerank requests.
 - See Also:
 
- 
putCustompublic final CompletableFuture<PutCustomResponse> putCustom(Function<PutCustomRequest.Builder, ObjectBuilder<PutCustomRequest>> fn) Create a custom inference endpoint.The custom service gives more control over how to interact with external inference services that aren't explicitly supported through dedicated integrations. The custom service gives you the ability to define the headers, url, query parameters, request body, and secrets. The custom service supports the template replacement functionality, which enables you to define a template that can be replaced with the value associated with that key. Templates are portions of a string that start with ${and end with}. The parameterssecret_parametersandtask_settingsare checked for keys for template replacement. Template replacement is supported in therequest,headers,url, andquery_parameters. If the definition (key) is not found for a template, an error message is returned. In case of an endpoint definition like the following:PUT _inference/text_embedding/test-text-embedding { "service": "custom", "service_settings": { "secret_parameters": { "api_key": "<some api key>" }, "url": "...endpoints.huggingface.cloud/v1/embeddings", "headers": { "Authorization": "Bearer ${api_key}", "Content-Type": "application/json" }, "request": "{\"input\": ${input}}", "response": { "json_parser": { "text_embeddings":"$.data[*].embedding[*]" } } } }To replace ${api_key}thesecret_parametersandtask_settingsare checked for a key namedapi_key.info Templates should not be surrounded by quotes. Pre-defined templates: - ${input}refers to the array of input strings that comes from the- inputfield of the subsequent inference requests.
- ${input_type}refers to the input type translation values.
- ${query}refers to the query field used specifically for reranking tasks.
- ${top_n}refers to the- top_nfield available when performing rerank requests.
- ${return_documents}refers to the- return_documentsfield available when performing rerank requests.
 - Parameters:
- fn- a function that initializes a builder to create the- PutCustomRequest
- See Also:
 
- 
putDeepseekCreate a DeepSeek inference endpoint.Create an inference endpoint to perform an inference task with the deepseekservice.- See Also:
 
- 
putDeepseekpublic final CompletableFuture<PutDeepseekResponse> putDeepseek(Function<PutDeepseekRequest.Builder, ObjectBuilder<PutDeepseekRequest>> fn) Create a DeepSeek inference endpoint.Create an inference endpoint to perform an inference task with the deepseekservice.- Parameters:
- fn- a function that initializes a builder to create the- PutDeepseekRequest
- See Also:
 
- 
putElasticsearchpublic CompletableFuture<PutElasticsearchResponse> putElasticsearch(PutElasticsearchRequest request) Create an Elasticsearch inference endpoint.Create an inference endpoint to perform an inference task with the elasticsearchservice.info Your Elasticsearch deployment contains preconfigured ELSER and E5 inference endpoints, you only need to create the enpoints using the API if you want to customize the settings. If you use the ELSER or the E5 model through the elasticsearchservice, the API request will automatically download and deploy the model if it isn't downloaded yet.info You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value. After creating the endpoint, wait for the model deployment to complete before using it. To verify the deployment status, use the get trained model statistics API. Look for "state": "fully_allocated"in the response and ensure that the"allocation_count"matches the"target_allocation_count". Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.- See Also:
 
- 
putElasticsearchpublic final CompletableFuture<PutElasticsearchResponse> putElasticsearch(Function<PutElasticsearchRequest.Builder, ObjectBuilder<PutElasticsearchRequest>> fn) Create an Elasticsearch inference endpoint.Create an inference endpoint to perform an inference task with the elasticsearchservice.info Your Elasticsearch deployment contains preconfigured ELSER and E5 inference endpoints, you only need to create the enpoints using the API if you want to customize the settings. If you use the ELSER or the E5 model through the elasticsearchservice, the API request will automatically download and deploy the model if it isn't downloaded yet.info You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value. After creating the endpoint, wait for the model deployment to complete before using it. To verify the deployment status, use the get trained model statistics API. Look for "state": "fully_allocated"in the response and ensure that the"allocation_count"matches the"target_allocation_count". Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.- Parameters:
- fn- a function that initializes a builder to create the- PutElasticsearchRequest
- See Also:
 
- 
putElserCreate an ELSER inference endpoint.Create an inference endpoint to perform an inference task with the elserservice. You can also deploy ELSER by using the Elasticsearch inference integration.info Your Elasticsearch deployment contains a preconfigured ELSER inference endpoint, you only need to create the enpoint using the API if you want to customize the settings. The API request will automatically download and deploy the ELSER model if it isn't already downloaded. info You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value. After creating the endpoint, wait for the model deployment to complete before using it. To verify the deployment status, use the get trained model statistics API. Look for "state": "fully_allocated"in the response and ensure that the"allocation_count"matches the"target_allocation_count". Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.- See Also:
 
- 
putElserpublic final CompletableFuture<PutElserResponse> putElser(Function<PutElserRequest.Builder, ObjectBuilder<PutElserRequest>> fn) Create an ELSER inference endpoint.Create an inference endpoint to perform an inference task with the elserservice. You can also deploy ELSER by using the Elasticsearch inference integration.info Your Elasticsearch deployment contains a preconfigured ELSER inference endpoint, you only need to create the enpoint using the API if you want to customize the settings. The API request will automatically download and deploy the ELSER model if it isn't already downloaded. info You might see a 502 bad gateway error in the response when using the Kibana Console. This error usually just reflects a timeout, while the model downloads in the background. You can check the download progress in the Machine Learning UI. If using the Python client, you can set the timeout parameter to a higher value. After creating the endpoint, wait for the model deployment to complete before using it. To verify the deployment status, use the get trained model statistics API. Look for "state": "fully_allocated"in the response and ensure that the"allocation_count"matches the"target_allocation_count". Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources.- Parameters:
- fn- a function that initializes a builder to create the- PutElserRequest
- See Also:
 
- 
putGoogleaistudiopublic CompletableFuture<PutGoogleaistudioResponse> putGoogleaistudio(PutGoogleaistudioRequest request) Create an Google AI Studio inference endpoint.Create an inference endpoint to perform an inference task with the googleaistudioservice.- See Also:
 
- 
putGoogleaistudiopublic final CompletableFuture<PutGoogleaistudioResponse> putGoogleaistudio(Function<PutGoogleaistudioRequest.Builder, ObjectBuilder<PutGoogleaistudioRequest>> fn) Create an Google AI Studio inference endpoint.Create an inference endpoint to perform an inference task with the googleaistudioservice.- Parameters:
- fn- a function that initializes a builder to create the- PutGoogleaistudioRequest
- See Also:
 
- 
putGooglevertexaipublic CompletableFuture<PutGooglevertexaiResponse> putGooglevertexai(PutGooglevertexaiRequest request) Create a Google Vertex AI inference endpoint.Create an inference endpoint to perform an inference task with the googlevertexaiservice.- See Also:
 
- 
putGooglevertexaipublic final CompletableFuture<PutGooglevertexaiResponse> putGooglevertexai(Function<PutGooglevertexaiRequest.Builder, ObjectBuilder<PutGooglevertexaiRequest>> fn) Create a Google Vertex AI inference endpoint.Create an inference endpoint to perform an inference task with the googlevertexaiservice.- Parameters:
- fn- a function that initializes a builder to create the- PutGooglevertexaiRequest
- See Also:
 
- 
putHuggingFaceCreate a Hugging Face inference endpoint.Create an inference endpoint to perform an inference task with the hugging_faceservice. Supported tasks include:text_embedding,completion, andchat_completion.To configure the endpoint, first visit the Hugging Face Inference Endpoints page and create a new endpoint. Select a model that supports the task you intend to use. For Elastic's text_embeddingtask: The selected model must support theSentence Embeddingstask. On the new endpoint creation page, select theSentence Embeddingstask under theAdvanced Configurationsection. After the endpoint has initialized, copy the generated endpoint URL. Recommended models fortext_embeddingtask:- all-MiniLM-L6-v2
- all-MiniLM-L12-v2
- all-mpnet-base-v2
- e5-base-v2
- e5-small-v2
- multilingual-e5-base
- multilingual-e5-small
 For Elastic's chat_completionandcompletiontasks: The selected model must support theText Generationtask and expose OpenAI API. HuggingFace supports both serverless and dedicated endpoints forText Generation. When creating dedicated endpoint select theText Generationtask. After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes/v1/chat/completionspart in URL. Then, copy the full endpoint URL for use. Recommended models forchat_completionandcompletiontasks:- Mistral-7B-Instruct-v0.2
- QwQ-32B
- Phi-3-mini-128k-instruct
 For Elastic's reranktask: The selected model must support thesentence-rankingtask and expose OpenAI API. HuggingFace supports only dedicated (not serverless) endpoints forRerankso far. After the endpoint is initialized, copy the full endpoint URL for use. Tested models forreranktask:- bge-reranker-base
- jina-reranker-v1-turbo-en-GGUF
 - See Also:
 
- 
putHuggingFacepublic final CompletableFuture<PutHuggingFaceResponse> putHuggingFace(Function<PutHuggingFaceRequest.Builder, ObjectBuilder<PutHuggingFaceRequest>> fn) Create a Hugging Face inference endpoint.Create an inference endpoint to perform an inference task with the hugging_faceservice. Supported tasks include:text_embedding,completion, andchat_completion.To configure the endpoint, first visit the Hugging Face Inference Endpoints page and create a new endpoint. Select a model that supports the task you intend to use. For Elastic's text_embeddingtask: The selected model must support theSentence Embeddingstask. On the new endpoint creation page, select theSentence Embeddingstask under theAdvanced Configurationsection. After the endpoint has initialized, copy the generated endpoint URL. Recommended models fortext_embeddingtask:- all-MiniLM-L6-v2
- all-MiniLM-L12-v2
- all-mpnet-base-v2
- e5-base-v2
- e5-small-v2
- multilingual-e5-base
- multilingual-e5-small
 For Elastic's chat_completionandcompletiontasks: The selected model must support theText Generationtask and expose OpenAI API. HuggingFace supports both serverless and dedicated endpoints forText Generation. When creating dedicated endpoint select theText Generationtask. After the endpoint is initialized (for dedicated) or ready (for serverless), ensure it supports the OpenAI API and includes/v1/chat/completionspart in URL. Then, copy the full endpoint URL for use. Recommended models forchat_completionandcompletiontasks:- Mistral-7B-Instruct-v0.2
- QwQ-32B
- Phi-3-mini-128k-instruct
 For Elastic's reranktask: The selected model must support thesentence-rankingtask and expose OpenAI API. HuggingFace supports only dedicated (not serverless) endpoints forRerankso far. After the endpoint is initialized, copy the full endpoint URL for use. Tested models forreranktask:- bge-reranker-base
- jina-reranker-v1-turbo-en-GGUF
 - Parameters:
- fn- a function that initializes a builder to create the- PutHuggingFaceRequest
- See Also:
 
- 
putJinaaiCreate an JinaAI inference endpoint.Create an inference endpoint to perform an inference task with the jinaaiservice.To review the available rerankmodels, refer to https://jina.ai/reranker. To review the availabletext_embeddingmodels, refer to the https://jina.ai/embeddings/.- See Also:
 
- 
putJinaaipublic final CompletableFuture<PutJinaaiResponse> putJinaai(Function<PutJinaaiRequest.Builder, ObjectBuilder<PutJinaaiRequest>> fn) Create an JinaAI inference endpoint.Create an inference endpoint to perform an inference task with the jinaaiservice.To review the available rerankmodels, refer to https://jina.ai/reranker. To review the availabletext_embeddingmodels, refer to the https://jina.ai/embeddings/.- Parameters:
- fn- a function that initializes a builder to create the- PutJinaaiRequest
- See Also:
 
- 
putLlamaCreate a Llama inference endpoint.Create an inference endpoint to perform an inference task with the llamaservice.- See Also:
 
- 
putLlamapublic final CompletableFuture<PutLlamaResponse> putLlama(Function<PutLlamaRequest.Builder, ObjectBuilder<PutLlamaRequest>> fn) Create a Llama inference endpoint.Create an inference endpoint to perform an inference task with the llamaservice.- Parameters:
- fn- a function that initializes a builder to create the- PutLlamaRequest
- See Also:
 
- 
putMistralCreate a Mistral inference endpoint.Create an inference endpoint to perform an inference task with the mistralservice.- See Also:
 
- 
putMistralpublic final CompletableFuture<PutMistralResponse> putMistral(Function<PutMistralRequest.Builder, ObjectBuilder<PutMistralRequest>> fn) Create a Mistral inference endpoint.Create an inference endpoint to perform an inference task with the mistralservice.- Parameters:
- fn- a function that initializes a builder to create the- PutMistralRequest
- See Also:
 
- 
putOpenaiCreate an OpenAI inference endpoint.Create an inference endpoint to perform an inference task with the openaiservice oropenaicompatible APIs.- See Also:
 
- 
putOpenaipublic final CompletableFuture<PutOpenaiResponse> putOpenai(Function<PutOpenaiRequest.Builder, ObjectBuilder<PutOpenaiRequest>> fn) Create an OpenAI inference endpoint.Create an inference endpoint to perform an inference task with the openaiservice oropenaicompatible APIs.- Parameters:
- fn- a function that initializes a builder to create the- PutOpenaiRequest
- See Also:
 
- 
putVoyageaiCreate a VoyageAI inference endpoint.Create an inference endpoint to perform an inference task with the voyageaiservice.Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources. - See Also:
 
- 
putVoyageaipublic final CompletableFuture<PutVoyageaiResponse> putVoyageai(Function<PutVoyageaiRequest.Builder, ObjectBuilder<PutVoyageaiRequest>> fn) Create a VoyageAI inference endpoint.Create an inference endpoint to perform an inference task with the voyageaiservice.Avoid creating multiple endpoints for the same model unless required, as each endpoint consumes significant resources. - Parameters:
- fn- a function that initializes a builder to create the- PutVoyageaiRequest
- See Also:
 
- 
putWatsonxCreate a Watsonx inference endpoint.Create an inference endpoint to perform an inference task with the watsonxaiservice. You need an IBM Cloud Databases for Elasticsearch deployment to use thewatsonxaiinference service. You can provision one through the IBM catalog, the Cloud Databases CLI plug-in, the Cloud Databases API, or Terraform.- See Also:
 
- 
putWatsonxpublic final CompletableFuture<PutWatsonxResponse> putWatsonx(Function<PutWatsonxRequest.Builder, ObjectBuilder<PutWatsonxRequest>> fn) Create a Watsonx inference endpoint.Create an inference endpoint to perform an inference task with the watsonxaiservice. You need an IBM Cloud Databases for Elasticsearch deployment to use thewatsonxaiinference service. You can provision one through the IBM catalog, the Cloud Databases CLI plug-in, the Cloud Databases API, or Terraform.- Parameters:
- fn- a function that initializes a builder to create the- PutWatsonxRequest
- See Also:
 
- 
rerankPerform reranking inference on the service- See Also:
 
- 
rerankpublic final CompletableFuture<RerankResponse> rerank(Function<RerankRequest.Builder, ObjectBuilder<RerankRequest>> fn) Perform reranking inference on the service- Parameters:
- fn- a function that initializes a builder to create the- RerankRequest
- See Also:
 
- 
sparseEmbeddingPerform sparse embedding inference on the service- See Also:
 
- 
sparseEmbeddingpublic final CompletableFuture<SparseEmbeddingResponse> sparseEmbedding(Function<SparseEmbeddingRequest.Builder, ObjectBuilder<SparseEmbeddingRequest>> fn) Perform sparse embedding inference on the service- Parameters:
- fn- a function that initializes a builder to create the- SparseEmbeddingRequest
- See Also:
 
- 
streamCompletionPerform streaming inference. Get real-time responses for completion tasks by delivering answers incrementally, reducing response times during computation. This API works only with the completion task type.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs. This API requires the monitor_inferencecluster privilege (the built-ininference_adminandinference_userroles grant this privilege). You must use a client that supports streaming.- See Also:
 
- 
streamCompletionpublic final CompletableFuture<BinaryResponse> streamCompletion(Function<StreamCompletionRequest.Builder, ObjectBuilder<StreamCompletionRequest>> fn) Perform streaming inference. Get real-time responses for completion tasks by delivering answers incrementally, reducing response times during computation. This API works only with the completion task type.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs. This API requires the monitor_inferencecluster privilege (the built-ininference_adminandinference_userroles grant this privilege). You must use a client that supports streaming.- Parameters:
- fn- a function that initializes a builder to create the- StreamCompletionRequest
- See Also:
 
- 
textEmbeddingPerform text embedding inference on the service- See Also:
 
- 
textEmbeddingpublic final CompletableFuture<TextEmbeddingResponse> textEmbedding(Function<TextEmbeddingRequest.Builder, ObjectBuilder<TextEmbeddingRequest>> fn) Perform text embedding inference on the service- Parameters:
- fn- a function that initializes a builder to create the- TextEmbeddingRequest
- See Also:
 
- 
updateUpdate an inference endpoint.Modify task_settings, secrets (withinservice_settings), ornum_allocationsfor an inference endpoint, depending on the specific endpoint service andtask_type.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs. - See Also:
 
- 
updatepublic final CompletableFuture<UpdateInferenceResponse> update(Function<UpdateInferenceRequest.Builder, ObjectBuilder<UpdateInferenceRequest>> fn) Update an inference endpoint.Modify task_settings, secrets (withinservice_settings), ornum_allocationsfor an inference endpoint, depending on the specific endpoint service andtask_type.IMPORTANT: The inference APIs enable you to use certain services, such as built-in machine learning models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI, Anthropic, Watsonx.ai, or Hugging Face. For built-in models and models uploaded through Eland, the inference APIs offer an alternative way to use and manage trained models. However, if you do not plan to use the inference APIs to use these models or if you want to use non-NLP models, use the machine learning trained model APIs. - Parameters:
- fn- a function that initializes a builder to create the- UpdateInferenceRequest
- See Also:
 
 
-