Package ai.djl.serving.wlm
Class ModelInfo<I,O>
- java.lang.Object
-
- ai.djl.serving.wlm.ModelInfo<I,O>
-
public final class ModelInfo<I,O> extends java.lang.Object
A class represent a loaded model and it's metadata.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
ModelInfo.Status
An enum represents state of a model.
-
Constructor Summary
Constructors Constructor Description ModelInfo(java.lang.String modelUrl)
Constructs a newModelInfo
instance.ModelInfo(java.lang.String id, java.lang.String modelUrl, ai.djl.repository.zoo.Criteria<I,O> criteria)
Constructs aModelInfo
based on aCriteria
.ModelInfo(java.lang.String id, java.lang.String modelUrl, java.lang.String version, java.lang.String engineName, java.lang.String loadOnDevices, java.lang.Class<I> inputClass, java.lang.Class<O> outputClass, int queueSize, int maxIdleSeconds, int maxBatchDelayMillis, int batchSize, int minWorkers, int maxWorkers)
Constructs a newModelInfo
instance.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description void
close()
Close all loaded models.boolean
equals(java.lang.Object o)
int
getBatchSize()
Returns the configured batch size.java.lang.String
getEngineName()
Returns the engine name.java.lang.String
getId()
Returns the model ID.java.lang.Class<I>
getInputClass()
Returns the model input class.java.lang.String[]
getLoadOnDevices()
Returns the devices the model will be loaded on at startup.int
getMaxBatchDelayMillis()
Returns the maximum delay in milliseconds to aggregate a batch.int
getMaxIdleSeconds()
Returns the configured max idle time in seconds of workers.int
getMaxWorkers()
Returns the maximum number of workers.int
getMinWorkers()
Returns the minimum number of workers.ai.djl.repository.zoo.ZooModel<I,O>
getModel(ai.djl.Device device)
Returns the loadedZooModel
for a device.java.util.Map<ai.djl.Device,ai.djl.repository.zoo.ZooModel<I,O>>
getModels()
Returns all loaded models.java.lang.String
getModelUrl()
Returns the model url.java.lang.Class<O>
getOutputClass()
Returns the model output class.int
getQueueSize()
Returns the configured size of the workers queue.ModelInfo.Status
getStatus()
Returns the model loading status.java.lang.String
getVersion()
Returns the model version.int
hashCode()
void
hasInputOutputClass(java.lang.Class<I> inputClass, java.lang.Class<O> outputClass)
Clarifies the input and output class when not specified.static java.lang.String
inferModelNameFromUrl(java.lang.String url)
Infer model name form model URL in case model name is not provided.void
initialize()
Initialize the model.boolean
isParallelLoading()
Returns if the model can be load parallel on multiple devices.void
load(ai.djl.Device device)
Loads the model to the specified device.void
postWorkflowParsing(java.lang.String workflowDir)
Performs post workflow parsing initialization.void
setBatchSize(int batchSize)
Sets the configured batch size.void
setId(java.lang.String id)
Sets the model ID.void
setMaxBatchDelayMillis(int maxBatchDelayMillis)
Sets the maximum delay in milliseconds to aggregate a batch.void
setMaxIdleSeconds(int maxIdleSeconds)
Sets the configured max idle time in seconds of workers.void
setQueueSize(int queueSize)
Sets the configured size of the workers queue.java.lang.String
toString()
ai.djl.Device
withDefaultDevice(java.lang.String deviceName)
Returns the default device for this model if device is null.
-
-
-
Constructor Detail
-
ModelInfo
public ModelInfo(java.lang.String modelUrl)
Constructs a newModelInfo
instance.- Parameters:
modelUrl
- the model Url
-
ModelInfo
public ModelInfo(java.lang.String id, java.lang.String modelUrl, ai.djl.repository.zoo.Criteria<I,O> criteria)
Constructs aModelInfo
based on aCriteria
.- Parameters:
id
- the id for the createdModelInfo
modelUrl
- the model Urlcriteria
- the model criteria
-
ModelInfo
public ModelInfo(java.lang.String id, java.lang.String modelUrl, java.lang.String version, java.lang.String engineName, java.lang.String loadOnDevices, java.lang.Class<I> inputClass, java.lang.Class<O> outputClass, int queueSize, int maxIdleSeconds, int maxBatchDelayMillis, int batchSize, int minWorkers, int maxWorkers)
Constructs a newModelInfo
instance.- Parameters:
id
- the ID of the model that will be used by workflowmodelUrl
- the model urlversion
- the version of the modelengineName
- the engine to load the modelloadOnDevices
- the devices to load the model oninputClass
- the model input classoutputClass
- the model output classqueueSize
- the maximum request queue sizemaxIdleSeconds
- the initial maximum idle time for workersmaxBatchDelayMillis
- the initial maximum delay when scaling up before giving upbatchSize
- the batch size for this modelminWorkers
- the minimum number of workersmaxWorkers
- the maximum number of workers
-
-
Method Detail
-
postWorkflowParsing
public void postWorkflowParsing(java.lang.String workflowDir)
Performs post workflow parsing initialization.- Parameters:
workflowDir
- the workflow parent directory
-
load
public void load(ai.djl.Device device) throws ai.djl.ModelException, java.io.IOException
Loads the model to the specified device.- Parameters:
device
- the device to load model on- Throws:
java.io.IOException
- if failed to read model fileai.djl.ModelException
- if failed to load the specified model
-
getModels
public java.util.Map<ai.djl.Device,ai.djl.repository.zoo.ZooModel<I,O>> getModels()
Returns all loaded models.- Returns:
- all loaded models
-
getModel
public ai.djl.repository.zoo.ZooModel<I,O> getModel(ai.djl.Device device)
Returns the loadedZooModel
for a device.- Parameters:
device
- the device to return the model on- Returns:
- the loaded
ZooModel
-
setId
public void setId(java.lang.String id)
Sets the model ID.- Parameters:
id
- the model ID
-
getId
public java.lang.String getId()
Returns the model ID.- Returns:
- the model ID
-
getVersion
public java.lang.String getVersion()
Returns the model version.- Returns:
- the model version
-
getEngineName
public java.lang.String getEngineName()
Returns the engine name.- Returns:
- the engine name
-
getModelUrl
public java.lang.String getModelUrl()
Returns the model url.- Returns:
- the model url
-
getStatus
public ModelInfo.Status getStatus()
Returns the model loading status.- Returns:
- the model loading status
-
getInputClass
public java.lang.Class<I> getInputClass()
Returns the model input class.- Returns:
- the model input class
-
getOutputClass
public java.lang.Class<O> getOutputClass()
Returns the model output class.- Returns:
- the model output class
-
hasInputOutputClass
public void hasInputOutputClass(java.lang.Class<I> inputClass, java.lang.Class<O> outputClass)
Clarifies the input and output class when not specified.Warning: This is intended for internal use with reflection.
- Parameters:
inputClass
- the model input classoutputClass
- the model output class
-
setMaxIdleSeconds
public void setMaxIdleSeconds(int maxIdleSeconds)
Sets the configured max idle time in seconds of workers.- Parameters:
maxIdleSeconds
- the configured max idle time in seconds of workers
-
getMaxIdleSeconds
public int getMaxIdleSeconds()
Returns the configured max idle time in seconds of workers.- Returns:
- the max idle time in seconds
-
setBatchSize
public void setBatchSize(int batchSize)
Sets the configured batch size.- Parameters:
batchSize
- the configured batch size
-
getBatchSize
public int getBatchSize()
Returns the configured batch size.- Returns:
- the configured batch size
-
setMaxBatchDelayMillis
public void setMaxBatchDelayMillis(int maxBatchDelayMillis)
Sets the maximum delay in milliseconds to aggregate a batch.- Parameters:
maxBatchDelayMillis
- the maximum delay in milliseconds to aggregate a batch
-
getMaxBatchDelayMillis
public int getMaxBatchDelayMillis()
Returns the maximum delay in milliseconds to aggregate a batch.- Returns:
- the maximum delay in milliseconds to aggregate a batch
-
setQueueSize
public void setQueueSize(int queueSize)
Sets the configured size of the workers queue.- Parameters:
queueSize
- the configured size of the workers queue
-
getQueueSize
public int getQueueSize()
Returns the configured size of the workers queue.- Returns:
- requested size of the workers queue.
-
getMinWorkers
public int getMinWorkers()
Returns the minimum number of workers.- Returns:
- the minimum number of workers
-
getMaxWorkers
public int getMaxWorkers()
Returns the maximum number of workers.- Returns:
- the maximum number of workers
-
initialize
public void initialize() throws java.io.IOException, ai.djl.ModelException
Initialize the model.- Throws:
java.io.IOException
- if failed to download modelai.djl.repository.zoo.ModelNotFoundException
- if model not foundai.djl.ModelException
-
close
public void close()
Close all loaded models.
-
inferModelNameFromUrl
public static java.lang.String inferModelNameFromUrl(java.lang.String url)
Infer model name form model URL in case model name is not provided.- Parameters:
url
- the model URL- Returns:
- the model name
-
withDefaultDevice
public ai.djl.Device withDefaultDevice(java.lang.String deviceName)
Returns the default device for this model if device is null.- Parameters:
deviceName
- the device to use if it is not null- Returns:
- a non-null device
-
getLoadOnDevices
public java.lang.String[] getLoadOnDevices()
Returns the devices the model will be loaded on at startup.- Returns:
- the devices the model will be loaded on at startup
-
isParallelLoading
public boolean isParallelLoading()
Returns if the model can be load parallel on multiple devices.- Returns:
- if the model can be load parallel on multiple devices
-
equals
public boolean equals(java.lang.Object o)
- Overrides:
equals
in classjava.lang.Object
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classjava.lang.Object
-
toString
public java.lang.String toString()
- Overrides:
toString
in classjava.lang.Object
-
-