Class StartTrainedModelDeploymentRequest
java.lang.Object
co.elastic.clients.elasticsearch._types.RequestBase
co.elastic.clients.elasticsearch.ml.StartTrainedModelDeploymentRequest
Starts a trained model deployment, which allocates the model to every machine
learning node.
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from class co.elastic.clients.elasticsearch._types.RequestBase
RequestBase.AbstractBuilder<BuilderT extends RequestBase.AbstractBuilder<BuilderT>>
-
Field Summary
Modifier and TypeFieldDescriptionstatic final Endpoint<StartTrainedModelDeploymentRequest,
StartTrainedModelDeploymentResponse, ErrorResponse> Endpoint "ml.start_trained_model_deployment
". -
Method Summary
Modifier and TypeMethodDescriptionfinal String
The inference cache size (in memory outside the JVM heap) per node for the model.final String
modelId()
Required - The unique identifier of the trained model.final Integer
The number of model allocations on each node where the model is deployed.of
(Function<StartTrainedModelDeploymentRequest.Builder, ObjectBuilder<StartTrainedModelDeploymentRequest>> fn) final TrainingPriority
priority()
The deployment priority.final Integer
Specifies the number of inference requests that are allowed in the queue.final Integer
Sets the number of threads used by each model allocation during inference.final Time
timeout()
Specifies the amount of time to wait for the model to deploy.waitFor()
Specifies the allocation status to wait for before returning.Methods inherited from class co.elastic.clients.elasticsearch._types.RequestBase
toString
-
Field Details
-
_ENDPOINT
public static final Endpoint<StartTrainedModelDeploymentRequest,StartTrainedModelDeploymentResponse, _ENDPOINTErrorResponse> Endpoint "ml.start_trained_model_deployment
".
-
-
Method Details
-
of
-
cacheSize
The inference cache size (in memory outside the JVM heap) per node for the model. The default value is the same size as themodel_size_bytes
. To disable the cache,0b
can be provided.API name:
cache_size
-
modelId
Required - The unique identifier of the trained model. Currently, only PyTorch models are supported.API name:
model_id
-
numberOfAllocations
The number of model allocations on each node where the model is deployed. All allocations on a node share the same copy of the model in memory but use a separate set of threads to evaluate the model. Increasing this value generally increases the throughput. If this setting is greater than the number of hardware threads it will automatically be changed to a value less than the number of hardware threads.API name:
number_of_allocations
-
priority
The deployment priority.API name:
priority
-
queueCapacity
Specifies the number of inference requests that are allowed in the queue. After the number of requests exceeds this value, new requests are rejected with a 429 error.API name:
queue_capacity
-
threadsPerAllocation
Sets the number of threads used by each model allocation during inference. This generally increases the inference speed. The inference process is a compute-bound process; any number greater than the number of available hardware threads on the machine does not increase the inference speed. If this setting is greater than the number of hardware threads it will automatically be changed to a value less than the number of hardware threads.API name:
threads_per_allocation
-
timeout
Specifies the amount of time to wait for the model to deploy.API name:
timeout
-
waitFor
Specifies the allocation status to wait for before returning.API name:
wait_for
-