number of input plane, default is 1.
kernel tensor, default is a 9 x 9 tensor.
Computing the gradient of the module with respect to its own parameters.
Computing the gradient of the module with respect to its own parameters. Many modules do not perform this step as they do not have any parameters. The state variable name for the parameters is module dependent. The module is expected to accumulate the gradients with respect to the parameters in some variable.
Performs a back-propagation step through the module, with respect to the given input.
Performs a back-propagation step through the module, with respect to the given input. In general this method makes the assumption forward(input) has been called before, with the same input. This is necessary for optimization reasons. If you do not respect this rule, backward() will compute incorrect gradients.
input data
gradient of next layer
gradient corresponding to input data
get execution engine type
get execution engine type
Clear cached activities to save storage space or network bandwidth.
Clear cached activities to save storage space or network bandwidth. Note that we use Tensor.set to keep some information like tensor share
The subclass should override this method if it allocate some extra resource, and call the super.clearState in the override method
Copy the useful running status from src to this.
Copy the useful running status from src to this.
The subclass should override this method if it has some parameters besides weight and bias. Such as runningMean and runningVar of BatchNormalization.
source Module
this
Takes an input object, and computes the corresponding output of the module.
Takes an input object, and computes the corresponding output of the module. After a forward, the output state variable should have been updated to the new value.
input data
output data
Get the module name, default name is className@namePostfix
Float or Double
This method compact all parameters and gradients of the model into two tensors.
This method compact all parameters and gradients of the model into two tensors. So it's easier to use optim method
This function returns a table contains ModuleName, the parameter names and parameter value in this module.
This function returns a table contains ModuleName, the parameter names and parameter value in this module. The result table is a structure of Table(ModuleName -> Table(ParameterName -> ParameterValue)), and the type is Table[String, Table[String, Tensor[T]]].
For example, get the weight of a module named conv1: table[Table]("conv1")[Tensor[T]]("weight").
Custom modules should override this function if they have parameters.
Table
The cached gradient of activities.
The cached gradient of activities. So we don't compute it again when need it
kernel tensor, default is a 9 x 9 tensor.
number of input plane, default is 1.
The cached output.
The cached output. So we don't compute it again when need it
This function returns two arrays.
This function returns two arrays. One for the weights and the other the gradients Custom modules should override this function if they have parameters
(Array of weights, Array of grad)
module predict, return the probability distribution
module predict, return the probability distribution
dataset for prediction
module predict, return the predict label
module predict, return the predict label
dataset for prediction
Set the module name
Module status.
Module status. It is useful for modules like dropout/batch normalization
Computing the gradient of the module with respect to its own input.
Computing the gradient of the module with respect to its own input. This is returned in gradInput. Also, the gradInput state variable is updated accordingly.
Computes the output using the current parameter set of the class and input.
Computes the output using the current parameter set of the class and input. This function returns the result which is stored in the output field.
If the module has parameters, this will zero the accumulation of the gradients with respect to these parameters.
If the module has parameters, this will zero the accumulation of the gradients with respect to these parameters. Otherwise, it does nothing.
Applies a spatial subtraction operation on a series of 2D inputs using kernel for computing the weighted average in a neighborhood. The neighborhood is defined for a local spatial region that is the size as kernel and across all features. For a an input image, since there is only one feature, the region is only spatial. For an RGB image, the weighted average is taken over RGB channels and a spatial region.
If the kernel is 1D, then it will be used for constructing and separable 2D kernel. The operations will be much more efficient in this case.
The kernel is generally chosen as a gaussian when it is believed that the correlation of two pixel locations decrease with increasing distance. On the feature dimension, a uniform average is used since the weighting across features is not known.