org.platanios.tensorflow.api.ops.training.optimizers
Learning rate. Must be > 0
. If used with decay
, then this argument specifies the
initial value of the learning rate.
Learning rate decay method to use for each update.
Momentum. Must be >= 0
.
Boolean value indicating whether to use Nesterov acceleration or not. For details, refer to [Sutskever et. al., 2013](http://proceedings.mlr.press/v28/sutskever13.pdf).
If true
, the gradient descent updates will be protected by a lock. Otherwise, the
behavior is undefined, but may exhibit less contention.
Optional summary tag name to use for the learning rate value. If null
, no summary
is created for the learning rate. Otherwise, a scalar summary is created which can be
monitored using TensorBoard.
Name for this optimizer.
Applies the updates corresponding to the provided gradient, to the provided variable.
Applies the updates corresponding to the provided gradient, to the provided variable.
Gradient tensor.
Variable.
Option containing current iteration in the optimization loop, if one has been provided.
Created op that applies the provided gradient to the provided variable.
Creates an op that applies the provided gradients to the provided variables.
Creates an op that applies the provided gradients to the provided variables.
Sequence with gradient-variable pairs.
Optional Variable
to increment by one after the variables have been updated.
Name for the created op.
Created op.
Applies the updates corresponding to the provided gradient, to the provided variable.
Applies the updates corresponding to the provided gradient, to the provided variable.
The OutputIndexedSlices object specified by gradient
in this function is by default pre-processed in
applySparseDuplicateIndices
to remove duplicate indices (refer to that function's documentation for details).
Optimizers which can tolerate or have correct special cases for duplicate sparse indices may override
applySparseDuplicateIndices
instead of this function, avoiding that overhead.
Gradient tensor.
Variable.
Option containing current iteration in the optimization loop, if one has been provided.
Created op that applies the provided gradient to the provided variable.
Applies the updates corresponding to the provided gradient (with potentially duplicate indices), to the provided variable.
Applies the updates corresponding to the provided gradient (with potentially duplicate indices), to the provided variable.
Optimizers which override this method must deal with OutputIndexedSlices objects such as the following:
OutputIndexedSlices(indices=[0, 0], values=[1, 1], denseShape=[1])
, which contain duplicate indices. The
correct interpretation in that case should be: OutputIndexedSlices(values=[2], indices=[0], denseShape=[1])
.
Many optimizers deal incorrectly with repeated indices when updating based on sparse gradients (e.g. summing squares rather than squaring the sum, or applying momentum terms multiple times). Adding first is always the correct behavior, so this is enforced here by reconstructing the OutputIndexedSlices to have only unique indices, and then calling applySparse.
Optimizers which deal correctly with repeated indices may instead override this method to avoid the induced overhead.
Gradient tensor.
Variable.
Option containing current iteration in the optimization loop, if one has been provided.
Created op that applies the provided gradient to the provided variable.
Computes the gradients of loss
with respect to the variables in variables
, if provided, otherwise with respect
to all the trainable variables in the graph where loss
is defined.
Computes the gradients of loss
with respect to the variables in variables
, if provided, otherwise with respect
to all the trainable variables in the graph where loss
is defined.
Loss value whose gradients will be computed.
Optional gradients to back-propagate for loss
.
Optional list of variables for which to compute the gradients. Defaults to the
set of trainable variables in the graph where loss
is defined.
Gating method for the gradients computation.
Aggregation method used to combine gradient terms.
Boolean value indicating whether to colocate the gradient ops with the original ops.
Sequence of gradient-variable pairs.
Create all slots needed by this optimizer.
Create all slots needed by this optimizer.
Learning rate decay method to use for each update.
Creates an op that finishes the gradients application.
Creates an op that finishes the gradients application. This function is called from within an op creation context that uses as its name scope the name that users have chosen for the application of gradients.
Set of ops needed to apply the gradients and update the variable values.
Name scope to use for all the ops created by this function.
Created op output.
Gets a non-slot variable that has been added to this optimizer (or throws an error if no such non-slot variable could be found in this optimizer).
Gets a non-slot variable that has been added to this optimizer (or throws an error if no such non-slot variable could be found in this optimizer).
Variable name.
Graph in which the variable is defined.
Obtained non-slot variable.
Gets all the non-slot variables that have been added to this optimizer.
Gets all the non-slot variables that have been added to this optimizer.
Gets or creates (and adds to this optimizer) a non-slot variable.
Gets or creates (and adds to this optimizer) a non-slot variable.
Variable name.
Variable initial value.
Set of colocation ops for the non-slot variable.
Created non-slot variable.
Gets an existing slot.
Gets an existing slot.
Slot name.
Slot primary variable.
Requested slot variable, or null
if it cannot be found.
Gets an existing slot or creates a new one if none exists, for the provided arguments.
Gets an existing slot or creates a new one if none exists, for the provided arguments.
Slot name.
Slot primary variable.
Slot variable initializer.
Slot variable shape.
Slot variable data type.
Name to use when scoping the variable that needs to be created for the slot.
Requested slot variable.
Boolean value indicating whether to ignore duplicate indices during sparse updates.
Boolean value indicating whether to ignore duplicate indices during sparse updates.
Learning rate.
Learning rate. Must be > 0
. If used with decay
, then this argument specifies the
initial value of the learning rate.
Optional summary tag name to use for the learning rate value.
Optional summary tag name to use for the learning rate value. If null
, no summary
is created for the learning rate. Otherwise, a scalar summary is created which can be
monitored using TensorBoard.
Creates an op that makes a step towards minimizing loss
by updating the values of the variables in variables
.
Creates an op that makes a step towards minimizing loss
by updating the values of the variables in variables
.
This method simply combines calls computeGradients and applyGradients. If you want to process the gradients before applying them call computeGradients and applyGradients explicitly instead of using this method.
Loss value whose gradients will be computed.
Optional gradients to back-propagate for loss
.
Optional list of variables for which to compute the gradients. Defaults to the
set of trainable variables in the graph where loss
is defined.
Gating method for the gradients computation.
Aggregation method used to combine gradient terms.
Boolean value indicating whether to colocate the gradient ops with the original ops.
Optional Variable
to increment by one after the variables have been updated.
Name for the created op.
Created op.
Momentum.
Momentum. Must be >= 0
.
Name for this optimizer.
Name for this optimizer.
Contains variables used by some optimizers that require no slots to be stored.
Contains variables used by some optimizers that require no slots to be stored.
Creates all necessary tensors before applying the gradients.
Creates all necessary tensors before applying the gradients. This function is called from within an op creation context that uses as its name scope the name that users have chosen for the application of gradients.
Returns the names of all slots used by this optimizer.
Returns the names of all slots used by this optimizer.
Some Optimizer subclasses use additional variables.
Supported data types for the loss function, the variables, and the gradients.
Supported data types for the loss function, the variables, and the gradients. Subclasses should override this field allow other float types.
If true
, the gradient descent updates will be protected by a lock.
If true
, the gradient descent updates will be protected by a lock. Otherwise, the
behavior is undefined, but may exhibit less contention.
Boolean value indicating whether to use Nesterov acceleration or not.
Boolean value indicating whether to use Nesterov acceleration or not. For details, refer to [Sutskever et. al., 2013](http://proceedings.mlr.press/v28/sutskever13.pdf).
Returns a sequence of variables which encode the current state of this optimizer.
Returns a sequence of variables which encode the current state of this optimizer. The returned variables include both slot variables and non-slot global variables created by this optimizer, in the current graph.
Gets an existing slot or creates a new one using an initial value of zeros, if none exists.
Gets an existing slot or creates a new one using an initial value of zeros, if none exists.
Slot name.
Slot primary variable.
Name to use when scoping the variable that needs to be created for the slot.
Requested slot variable.
Optimizer that implements the gradient descent algorithm and includes support for learning rate decay, momentum, and Nesterov acceleration.