lamp.nn.TransformerDecoderBlock
See theTransformerDecoderBlock companion object
case class TransformerDecoderBlock(attentionDecoderDecoder: MultiheadAttention, attentionEncoderDecoder: MultiheadAttention, layerNorm1: LayerNorm, layerNorm2: LayerNorm, layerNorm3: LayerNorm, layerNorm4: LayerNorm, w1: Constant, b1: Constant, w2: Constant, b2: Constant, dropout: Double, train: Boolean) extends GenericModule[(Variable, Variable, Option[STen]), Variable]
Attributes
-
Companion
-
object
-
Graph
-
-
Supertypes
-
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
Members list
The implementation of the function.
The implementation of the function.
In addition of x
it can also use all the `state to compute its value.
Attributes
List of optimizable, or non-optimizable, but stateful parameters
List of optimizable, or non-optimizable, but stateful parameters
Stateful means that the state is carried over the repeated forward calls.
Attributes
Computes the gradient of loss with respect to the parameters.
Computes the gradient of loss with respect to the parameters.
Attributes
-
Inherited from:
-
GenericModule
Returns the total number of optimizable parameters.
Returns the total number of optimizable parameters.
Attributes
-
Inherited from:
-
GenericModule
Returns the state variables which need gradient computation.
Returns the state variables which need gradient computation.
Attributes
-
Inherited from:
-
GenericModule
Attributes
-
Inherited from:
-
Product
Attributes
-
Inherited from:
-
Product