A Layer represents a neural network. Each Layer can be included in as a sub-network of another Layer, forming a more complex neural network. The nesting structure of Layer can be used to represent mathematical expression or Coarse-grain neural network structure.
When a neural network is written, the most elements in it are placeholders. When network training begins, the data enter into the network.
The above mathematical expression with equivalent codes can be written, by Symbolic, as: (1.0 + x) * 2.0.toWeight. 2.0.toWeight represents a variable, of which the initial value is 2. The value updates during neural network iteration.
Both Times and Plus are of case class, therefore, myLayer is a tree in nested structure consisted of the case class. Times and Plus are placeholders.
Weight is a Layer containing weight, of which the initial value is 2.0.
Identity is a Layer with equal input and output, which return the same input back. The Identity here is the placeholder of Input.
When invoking forward in Layer.Aux[A,B], A is input type, B is output type, and both A and B are Tape. Now, the codes are interpreted segment by segment as follows.
For example:
val inputTape: Tape.Aux[Double, Double] = Literal(a)
val outputTape = myLayer.forward(inputTape)
When invoking myLayer.forward(inputData), forward of Times shall be invoked first, of which the pseudo codes are as follows:
finalcaseclass Times(operand1: Layer, operand2: Layer) extends Layer {
def forward(inputData: Tape): Output = {
val upstream1 = operand1.forward(input)
val upstream2 = operand2.forward(input)
new Output(upstream1, upstream2)// the concrete realization is ignored here, and recursion details are focused on
}
finalclass Output(upstream1: Tape, upstream2: Tape) extends Tape { ... }
}
It is Plus at myLayer.operand1, and Weight at myLayer.operand2. Therefore, upstream1 and upstream2 are the results of forward of operand1 and operand2 respectively.
In a similar way, the forward code of Plus is similar to forward of Times. During the invoking for forward of Plus, operand1 is Literal, and operand2 is Identity. At this point, forward of Literal and Identity of each are invoked respectively.
During the invoking for forward of Identity, the same input will be returned. The pseudo code for forward of Identity is as follows:
Therefore, Input is the x in mathematical expression (1.0 + x) * 2.0.toWeight, and in this way, Input is propagated to the neural network.
The return value outputTape of myLayer.forward is in Tape type. Therefore, a tree consisted of Tape will be generated finally with the structure similar to that of myLayer.
Therefore, via layer-by-layer propagation, the same myLayer.forward(inputTape) is finally returned by Identity and combined into the newly generated Tape tree.
The computation result including forward of outputTape can be used for outputTape, for example:
try {
val loss = outputTape.value
outputTape.backward(loss)
loss
} finally {
outputTape.close()
}
outputTape.value is the computation result of mathematical expression (1.0 + x) * 2.0.toWeight
Backward
outputTape.backward is the outputTape.backward of Times.Output, of which the pseudo code is as follows:
outputTape.upstream1 and outputTape.upstream2 are the results of forward of operand1 and operand2 respectively, which are followed by backward of outputTape.upstream1 and outputTape.upstream2.
In a similar way, the backward code of Plus is similar to backward of Times. During the invoking for backward of Plus, upstream1 and upstream2 are the results of forward of Literal and Identity respectively. At this point, backward of upstream1 and upstream2 of each are invoked respectively.
Weight updates during backward, refer to updateDouble
Aux & Symbolic API
Layer.Aux[A,B] represents that Input is of A type, and Output is of B type. Tape.Aux[C,D] represents that Data is of C type, and Delta is of D type.
Layer.Aux and Type.Aux can be combined for use. For example, Layer.Aux[Tape.Aux[A,B],Tape.Aux[C,D]] can be used to represent that the input type of a layer is a Tape, and the data type of this Tape is A, delta type is B; the output type of a layer is a Tape, and the data type of this Tape is C, delta type is D.
Aux is a design pattern which realized type refinement and can be used to limit the range of type parameters.
Generally, we will not handwrite Aux type, because we can use Symbolic to acquire the same effect. For example, when used for symbolic method internal variable and return value: Layer.Aux[Tape.Aux[INDArray, INDArray], Tape.Aux[INDArray, INDArray and INDArray @Symbolic are equivalent, so we usually use Symbolic to replace the writing method of Aux.
A
Layer
represents a neural network. EachLayer
can be included in as a sub-network of anotherLayer
, forming a more complex neural network. The nesting structure ofLayer
can be used to represent mathematical expression or Coarse-grain neural network structure. When a neural network is written, the most elements in it are placeholders. When network training begins, the data enter into the network.Tree structure of
Layer
The above mathematical expression with equivalent codes can be written, by Symbolic, as:
(1.0 + x) * 2.0.toWeight
.2.0.toWeight
represents a variable, of which the initial value is2
. The value updates during neural network iteration.Both Times and Plus are of case class, therefore,
myLayer
is a tree in nested structure consisted of the case class.Times
andPlus
are placeholders.Weight is a
Layer
containing weight, of which the initial value is2.0
.Identity is a
Layer
with equal input and output, which return the same input back. TheIdentity
here is the placeholder ofInput
.Literal is a
Layer
containing a constant.Iteration
Each training of the network is called as an iteration, including two stages: forward and backward, forming a complete process of https://en.wikipedia.org/wiki/Backpropagation
Forward
When invoking
forward
inLayer.Aux[A,B]
,A
is input type,B
is output type, and bothA
andB
are Tape. Now, the codes are interpreted segment by segment as follows.For example:
When invoking
myLayer.forward(inputData)
,forward
ofTimes
shall be invoked first, of which the pseudo codes are as follows:It is
Plus
atmyLayer.operand1
, andWeight
atmyLayer.operand2
. Therefore,upstream1
andupstream2
are the results offorward
ofoperand1
andoperand2
respectively.In a similar way, the
forward
code ofPlus
is similar toforward
ofTimes
. During the invoking forforward
ofPlus
, operand1 isLiteral
, and operand2 isIdentity
. At this point,forward
ofLiteral
andIdentity
of each are invoked respectively.During the invoking for
forward
ofIdentity
, the same input will be returned. The pseudo code forforward
ofIdentity
is as follows:Therefore,
Input
is thex
in mathematical expression(1.0 + x) * 2.0.toWeight
, and in this way,Input
is propagated to the neural network.The return value
outputTape
ofmyLayer.forward
is inTape
type. Therefore, a tree consisted ofTape
will be generated finally with the structure similar to that ofmyLayer
.Therefore, via layer-by-layer propagation, the same
myLayer.forward(inputTape)
is finally returned byIdentity
and combined into the newly generatedTape
tree.The computation result including
forward
ofoutputTape
can be used foroutputTape
, for example:outputTape.value
is the computation result of mathematical expression(1.0 + x) * 2.0.toWeight
Backward
outputTape.backward
is theoutputTape.backward
ofTimes.Output
, of which the pseudo code is as follows:outputTape.upstream1
andoutputTape.upstream2
are the results offorward
ofoperand1
andoperand2
respectively, which are followed bybackward
ofoutputTape.upstream1
andoutputTape.upstream2
.In a similar way, the
backward
code ofPlus
is similar to
backwardof
Times. During the invoking for
backwardof
Plus,
upstream1and
upstream2are the results of
forwardof
Literaland
Identityrespectively. At this point,
backwardof
upstream1and
upstream2of each are invoked respectively.
Weight
updates duringbackward
, refer to updateDoubleAux & Symbolic API
Layer.Aux[A,B]
represents thatInput
is ofA
type, andOutput
is ofB
type.Tape.Aux[C,D]
represents thatData
is ofC
type, andDelta
is ofD
type.Layer.Aux
andType.Aux
can be combined for use. For example,Layer.Aux[Tape.Aux[A,B]
,Tape.Aux[C,D]]
can be used to represent that the input type of alayer
is aTape
, and the data type of thisTape
isA
,delta
type isB
; the output type of alayer
is aTape
, and the data type of thisTape
isC
,delta
type isD
.Aux is a design pattern which realized type refinement and can be used to limit the range of type parameters.
Generally, we will not handwrite
Aux
type, because we can useSymbolic
to acquire the same effect. For example, when used for symbolic method internal variable and return value:Layer.Aux[Tape.Aux[INDArray, INDArray], Tape.Aux[INDArray, INDArray
andINDArray @Symbolic
are equivalent, so we usually useSymbolic
to replace the writing method ofAux
.Symbolic
Backpropagation
type refinement
aux pattern evolution
aux pattern