TransformerDecoderBlock

lamp.nn.TransformerDecoderBlock

See theTransformerDecoderBlock companion object

case class TransformerDecoderBlock(attentionDecoderDecoder: MultiheadAttention, attentionEncoderDecoder: MultiheadAttention, layerNorm1: LayerNorm, layerNorm2: LayerNorm, layerNorm3: LayerNorm, layerNorm4: LayerNorm, w1: Constant, b1: Constant, w2: Constant, b2: Constant, dropout: Double, train: Boolean) extends GenericModule[(Variable, Variable, Option[STen]), Variable]

Attributes

Companion: object
Graph
Supertypes: trait Serializable

trait Product

trait Equals

trait GenericModule[(Variable, Variable, Option[STen]), Variable]

class Object

trait Matchable

class Any
Show all

The implementation of the function.

In addition of x it can also use all the `state to compute its value.

List of optimizable, or non-optimizable, but stateful parameters

Stateful means that the state is carried over the repeated forward calls.

Alias of forward

Computes the gradient of loss with respect to the parameters.

Returns the total number of optimizable parameters.

Returns the state variables which need gradient computation.

In this article

Generated with