Aggregation method used to combine gradients.
Aggregation method used to combine gradients.
Computing partial derivatives can require aggregating gradient contributions. All such aggregation methods are represented as objects extending this trait.
Gradient aggregation method that simply adds up the collected gradients, without first waiting for all of them to become available at once.
Gradient aggregation method that simply adds up the collected gradients, without first waiting for all of them to become available at once.
The benefit of using this method is that its inputs can be combined in any order and this can allow the expression to be evaluated with a smaller memory footprint. With this method, it is possible to compute a sum of terms which are much larger than total GPU memory.
Gradient aggregation method that simply adds up the collected gradients.
Registry that contains the gradient functions to be used when creating gradient ops.
Registry that contains the gradient functions to be used when creating gradient ops. Gradient functions for all types of ops that are being differentiated need to be registered using either the Registry.register or the Registry.registerNonDifferentiable functions. In an attempt to obtain the gradient of an op whose type has no gradient function registered, a NoSuchElementException will be thrown.
Adds ops to the graph to compute the partial derivatives of the sum of y
s with respect to the x
s, using the
C++ gradients support of the TensorFlow native library.
Adds ops to the graph to compute the partial derivatives of the sum of y
s with respect to the x
s, using the
C++ gradients support of the TensorFlow native library.
Note that the C++ gradients support of the TensorFlow native library is incomplete and will not be sufficient for many use cases. It is mainly exposed as means of comparison to the Scala API functionality.
The result of this function is an array containing: d(y_1 + y_2 + ...)/dx_1
, d(y_1 + y_2 + ...)/dx_2
, ...
.
Tensors whose partial derivatives are computed.
Tensors with respect to which the gradients are computed.
Tensors to use as the initial gradients. They represent the symbolic partial derivatives of some loss
function L
with respect to y
. If null
, then ones are used. The number of tensors in dx
must
match the number of tensors in y
.
Partial derivatives of the y
s given each one of the x
s.