Package ai.djl.serving.wlm
Class WorkLoadManager
- java.lang.Object
-
- ai.djl.serving.wlm.WorkLoadManager
-
public class WorkLoadManager extends java.lang.Object
WorkLoadManager is responsible to manage the work load of worker thread. the manage scales up/down the required amount of worker threads per model.
-
-
Constructor Summary
Constructors Constructor Description WorkLoadManager()
Constructs aWorkLoadManager
instance.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Close all models related to theWorkloadManager
.int
getNumRunningWorkers(ModelInfo<?,?> modelInfo)
Returns the number of running workers of a model.<I,O>
WorkerPool<I,O>getWorkerPool(ModelInfo<I,O> modelInfo)
Returns theWorkerPool
for a model.<I,O>
WorkerPool<I,O>registerModel(ModelInfo<I,O> modelInfo)
Registers a model and returns theWorkerPool
for it.<I,O>
java.util.concurrent.CompletableFuture<O>runJob(Job<I,O> job)
Adds an inference job to the job queue of the next free worker.void
unregisterModel(ModelInfo<?,?> model)
Removes a model from management.
-
-
-
Constructor Detail
-
WorkLoadManager
public WorkLoadManager()
Constructs aWorkLoadManager
instance.
-
-
Method Detail
-
registerModel
public <I,O> WorkerPool<I,O> registerModel(ModelInfo<I,O> modelInfo)
Registers a model and returns theWorkerPool
for it.This operation is idempotent and will return the existing workerpool if the model was already registered.
- Type Parameters:
I
- the model input classO
- the model output class- Parameters:
modelInfo
- the model to create the worker pool for- Returns:
- the
WorkerPool
-
unregisterModel
public void unregisterModel(ModelInfo<?,?> model)
Removes a model from management.- Parameters:
model
- the model to remove
-
runJob
public <I,O> java.util.concurrent.CompletableFuture<O> runJob(Job<I,O> job)
Adds an inference job to the job queue of the next free worker. scales up worker if necessary.- Type Parameters:
I
- the model input classO
- the model output class- Parameters:
job
- an inference job to be executed.- Returns:
true
if submit success, false otherwise.
-
getNumRunningWorkers
public int getNumRunningWorkers(ModelInfo<?,?> modelInfo)
Returns the number of running workers of a model. running workers are workers which are not stopped, in error or scheduled to scale down.- Parameters:
modelInfo
- the model we are interested in.- Returns:
- number of running workers.
-
getWorkerPool
public <I,O> WorkerPool<I,O> getWorkerPool(ModelInfo<I,O> modelInfo)
Returns theWorkerPool
for a model.- Type Parameters:
I
- the model input classO
- the model output class- Parameters:
modelInfo
- the model to get the worker pool for- Returns:
- the
WorkerPool
-
close
public void close()
Close all models related to theWorkloadManager
.
-
-