YellowFin

Instance Constructors

new YellowFin(learningRate: Float = 1.0f, decay: Schedule = FixedSchedule, momentum: Float = 0.0f, beta: Float = 0.999f, curvatureWindowWidth: Int = 20, zeroDebias: Boolean = true, sparsityDebias: Boolean = true, useNesterov: Boolean = false, useLocking: Boolean = false, learningRateSummaryTag: String = null, name: String = "YellowFin")

learningRate
Learning rate. Must be > 0. If used with decay, then this argument specifies the initial value of the learning rate.
decay
Learning rate decay method to use for each update.
momentum
Momentum. Must be >= 0.
beta
Smoothing parameter for estimations.
curvatureWindowWidth
Curvature window width. Must be > 1.
zeroDebias
If true, the moving averages will be zero-debiased.
sparsityDebias
The gradient norm and curvature are biased towards larger values when computed for sparse gradients. This is useful when the model is very sparse, e.g. LSTMs with word embeddings. For non-sparse CNNs, turning it off could slightly accelerate the algorithm's speed.
useNesterov
Boolean value indicating whether to use Nesterov acceleration or not. For details, refer to [Sutskever et. al., 2013](http://proceedings.mlr.press/v28/sutskever13.pdf).
useLocking
If true, the gradient descent updates will be protected by a lock. Otherwise, the behavior is undefined, but may exhibit less contention.
learningRateSummaryTag
Optional summary tag name to use for the learning rate value. If null, no summary is created for the learning rate. Otherwise, a scalar summary is created which can be monitored using TensorBoard.
name
Name for this optimizer.

Attributes
protected

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def applyDense(gradient: Output, variable: variables.Variable, iteration: Option[variables.Variable]): Op

Applies the updates corresponding to the provided gradient, to the provided variable.
Applies the updates corresponding to the provided gradient, to the provided variable.
gradient
Gradient tensor.
variable
Variable.
iteration
Option containing current iteration in the optimization loop, if one has been provided.
returns
Created op that applies the provided gradient to the provided variable.

Definition Classes
YellowFin → GradientDescent → Optimizer
def applyGradients(gradientsAndVariables: Seq[(OutputLike, variables.Variable)], iteration: Option[variables.Variable] = None, name: String = this.name): Op

Creates an op that applies the provided gradients to the provided variables.
Creates an op that applies the provided gradients to the provided variables.
gradientsAndVariables
Sequence with gradient-variable pairs.
iteration
Optional Variable to increment by one after the variables have been updated.
name
Name for the created op.
returns
Created op.

Definition Classes
YellowFin → Optimizer
def applySparse(gradient: OutputIndexedSlices, variable: variables.Variable, iteration: Option[variables.Variable]): Op

Applies the updates corresponding to the provided gradient, to the provided variable.
Applies the updates corresponding to the provided gradient, to the provided variable.
The OutputIndexedSlices object specified by gradient in this function is by default pre-processed in applySparseDuplicateIndices to remove duplicate indices (refer to that function's documentation for details). Optimizers which can tolerate or have correct special cases for duplicate sparse indices may override applySparseDuplicateIndices instead of this function, avoiding that overhead.
gradient
Gradient tensor.
variable
Variable.
iteration
Option containing current iteration in the optimization loop, if one has been provided.
returns
Created op that applies the provided gradient to the provided variable.

Definition Classes
YellowFin → GradientDescent → Optimizer
def applySparseDuplicateIndices(gradient: OutputIndexedSlices, variable: variables.Variable, iteration: Option[variables.Variable]): Op

Applies the updates corresponding to the provided gradient (with potentially duplicate indices), to the provided variable.
Applies the updates corresponding to the provided gradient (with potentially duplicate indices), to the provided variable.
Optimizers which override this method must deal with OutputIndexedSlices objects such as the following: OutputIndexedSlices(indices=[0, 0], values=[1, 1], denseShape=[1]), which contain duplicate indices. The correct interpretation in that case should be: OutputIndexedSlices(values=[2], indices=[0], denseShape=[1]).
Many optimizers deal incorrectly with repeated indices when updating based on sparse gradients (e.g. summing squares rather than squaring the sum, or applying momentum terms multiple times). Adding first is always the correct behavior, so this is enforced here by reconstructing the OutputIndexedSlices to have only unique indices, and then calling applySparse.
Optimizers which deal correctly with repeated indices may instead override this method to avoid the induced overhead.
gradient
Gradient tensor.
variable
Variable.
iteration
Option containing current iteration in the optimization loop, if one has been provided.
returns
Created op that applies the provided gradient to the provided variable.

Definition Classes
GradientDescent → Optimizer
final def asInstanceOf[T0]: T0

Definition Classes
Any
val beta: Float

Smoothing parameter for estimations.
var betaTensor: Output

Attributes
protected
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def computeGradients(loss: Output, lossGradients: Seq[OutputLike] = null, variables: Set[variables.Variable] = null, gradientsGatingMethod: GatingMethod = Gradients.OpGating, gradientsAggregationMethod: AggregationMethod = Gradients.AddAggregationMethod, colocateGradientsWithOps: Boolean = false): Seq[(OutputLike, variables.Variable)]

Computes the gradients of loss with respect to the variables in variables, if provided, otherwise with respect to all the trainable variables in the graph where loss is defined.
Computes the gradients of loss with respect to the variables in variables, if provided, otherwise with respect to all the trainable variables in the graph where loss is defined.
loss
Loss value whose gradients will be computed.
lossGradients
Optional gradients to back-propagate for loss.
variables
Optional list of variables for which to compute the gradients. Defaults to the set of trainable variables in the graph where loss is defined.
gradientsGatingMethod
Gating method for the gradients computation.
gradientsAggregationMethod
Aggregation method used to combine gradient terms.
colocateGradientsWithOps
Boolean value indicating whether to colocate the gradient ops with the original ops.
returns
Sequence of gradient-variable pairs.

Definition Classes
Optimizer
def createSlots(variables: Seq[variables.Variable]): Unit

Create all slots needed by this optimizer.
Create all slots needed by this optimizer.

Definition Classes
YellowFin → GradientDescent → Optimizer
def curvatureRange(gradNormSquaredSum: Output, sparsityAvg: Option[Output]): (Output, Output)

Attributes
protected
var curvatureWindow: variables.Variable

Attributes
protected
val curvatureWindowWidth: Int

Curvature window width.
Curvature window width. Must be > 1.
val decay: Schedule

Learning rate decay method to use for each update.
Learning rate decay method to use for each update.

Definition Classes
YellowFin → GradientDescent
def distanceToOptimum(gradNormSquaredSum: Output, gradNormSquaredAvg: Output, sparsityAvg: Option[Output]): Output

Attributes
protected
var doTune: Output

Attributes
protected
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def finish(updateOps: Set[Op], nameScope: String): Op

Creates an op that finishes the gradients application.
Creates an op that finishes the gradients application. This function is called from within an op creation context that uses as its name scope the name that users have chosen for the application of gradients.
updateOps
Set of ops needed to apply the gradients and update the variable values.
nameScope
Name scope to use for all the ops created by this function.
returns
Created op output.

Definition Classes
Optimizer
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getLearningRate(variable: variables.Variable, iteration: Option[variables.Variable]): Output

Attributes
protected
Definition Classes
YellowFin → GradientDescent
def getMomentum(variable: variables.Variable): Output

Attributes
protected
Definition Classes
YellowFin → GradientDescent
final def getNonSlotVariable(name: String, graph: core.Graph = null): variables.Variable

Gets a non-slot variable that has been added to this optimizer (or throws an error if no such non-slot variable could be found in this optimizer).
Gets a non-slot variable that has been added to this optimizer (or throws an error if no such non-slot variable could be found in this optimizer).
name
Variable name.
graph
Graph in which the variable is defined.
returns
Obtained non-slot variable.

Attributes
protected
Definition Classes
Optimizer
final def getNonSlotVariables: Iterable[variables.Variable]

Gets all the non-slot variables that have been added to this optimizer.
Gets all the non-slot variables that have been added to this optimizer.

Attributes
protected
Definition Classes
Optimizer
final def getOrCreateNonSlotVariable(name: String, initialValue: tensors.Tensor[_ <: types.DataType], colocationOps: Set[Op] = Set.empty, ignoreExisting: Boolean = false): variables.Variable

Gets or creates (and adds to this optimizer) a non-slot variable.
Gets or creates (and adds to this optimizer) a non-slot variable.
name
Variable name.
initialValue
Variable initial value.
colocationOps
Set of colocation ops for the non-slot variable.
returns
Created non-slot variable.

Attributes
protected
Definition Classes
Optimizer
final def getSlot(name: String, variable: variables.Variable): variables.Variable

Gets an existing slot.
Gets an existing slot.
name
Slot name.
variable
Slot primary variable.
returns
Requested slot variable, or null if it cannot be found.

Attributes
protected
Definition Classes
Optimizer
final def getSlot(name: String, variable: variables.Variable, initializer: Initializer, shape: core.Shape, dataType: types.DataType, variableScope: String): variables.Variable

Gets an existing slot or creates a new one if none exists, for the provided arguments.
Gets an existing slot or creates a new one if none exists, for the provided arguments.
name
Slot name.
variable
Slot primary variable.
initializer
Slot variable initializer.
shape
Slot variable shape.
dataType
Slot variable data type.
variableScope
Name to use when scoping the variable that needs to be created for the slot.
returns
Requested slot variable.

Attributes
protected
Definition Classes
Optimizer
def gradientsSparsity(gradients: Seq[OutputLike]): Option[Output]

Attributes
protected
def gradientsVariance(gradients: Seq[OutputLike], gradNormSquaredAvg: Output, sparsityAvg: Option[Output]): Output

Attributes
protected
def hashCode(): Int

Definition Classes
AnyRef → Any
val ignoreDuplicateSparseIndices: Boolean

Boolean value indicating whether to ignore duplicate indices during sparse updates.
Boolean value indicating whether to ignore duplicate indices during sparse updates.

Definition Classes
GradientDescent → Optimizer
var incrementStepOp: Op

Attributes
protected
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
val learningRate: Float

Learning rate.
Learning rate. Must be > 0. If used with decay, then this argument specifies the initial value of the learning rate.

Definition Classes
YellowFin → GradientDescent
var learningRateFactorVariable: variables.Variable

Attributes
protected
val learningRateSummaryTag: String

Optional summary tag name to use for the learning rate value.
Optional summary tag name to use for the learning rate value. If null, no summary is created for the learning rate. Otherwise, a scalar summary is created which can be monitored using TensorBoard.

Definition Classes
YellowFin → GradientDescent
var learningRateTensor: Output

Attributes
protected
Definition Classes
GradientDescent
var learningRateVariable: variables.Variable

Attributes
protected
final def minimize(loss: Output, lossGradients: Seq[OutputLike] = null, variables: Set[variables.Variable] = null, gradientsGatingMethod: GatingMethod = Gradients.OpGating, gradientsAggregationMethod: AggregationMethod = Gradients.AddAggregationMethod, colocateGradientsWithOps: Boolean = false, iteration: Option[variables.Variable] = None, name: String = "Minimize"): Op

Creates an op that makes a step towards minimizing loss by updating the values of the variables in variables.
Creates an op that makes a step towards minimizing loss by updating the values of the variables in variables.
This method simply combines calls computeGradients and applyGradients. If you want to process the gradients before applying them call computeGradients and applyGradients explicitly instead of using this method.
loss
Loss value whose gradients will be computed.
lossGradients
Optional gradients to back-propagate for loss.
variables
Optional list of variables for which to compute the gradients. Defaults to the set of trainable variables in the graph where loss is defined.
gradientsGatingMethod
Gating method for the gradients computation.
gradientsAggregationMethod
Aggregation method used to combine gradient terms.
colocateGradientsWithOps
Boolean value indicating whether to colocate the gradient ops with the original ops.
iteration
Optional Variable to increment by one after the variables have been updated.
name
Name for the created op.
returns
Created op.

Definition Classes
Optimizer
val momentum: Float

Momentum.
Momentum. Must be >= 0.

Definition Classes
YellowFin → GradientDescent
var momentumTensor: Output

Attributes
protected
Definition Classes
GradientDescent
var momentumVariable: variables.Variable

Attributes
protected
var movingAverage: ExponentialMovingAverage

Attributes
protected
val name: String

Name for this optimizer.
Name for this optimizer.

Definition Classes
YellowFin → GradientDescent → Optimizer
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final val nonSlotVariables: Map[(String, Option[core.Graph]), variables.Variable]

Contains variables used by some optimizers that require no slots to be stored.
Contains variables used by some optimizers that require no slots to be stored.

Attributes
protected
Definition Classes
Optimizer
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def prepare(iteration: Option[variables.Variable]): Unit

Creates all necessary tensors before applying the gradients.
Creates all necessary tensors before applying the gradients. This function is called from within an op creation context that uses as its name scope the name that users have chosen for the application of gradients.

Definition Classes
YellowFin → GradientDescent → Optimizer
final def slotNames: Set[String]

Returns the names of all slots used by this optimizer.
Returns the names of all slots used by this optimizer.

Attributes
protected
Definition Classes
Optimizer
final val slots: Map[String, Map[variables.Variable, variables.Variable]]

Some Optimizer subclasses use additional variables.
Some Optimizer subclasses use additional variables. For example, MomentumOptimizer and AdaGradOptimizer use variables to accumulate updates. This map is where these variables are stored.

Attributes
protected
Definition Classes
Optimizer
val sparsityDebias: Boolean

The gradient norm and curvature are biased towards larger values when computed for sparse gradients.
The gradient norm and curvature are biased towards larger values when computed for sparse gradients. This is useful when the model is very sparse, e.g. LSTMs with word embeddings. For non-sparse CNNs, turning it off could slightly accelerate the algorithm's speed.
var step: variables.Variable

Attributes
protected
val supportedDataTypes: Set[types.DataType]

Supported data types for the loss function, the variables, and the gradients.
Supported data types for the loss function, the variables, and the gradients. Subclasses should override this field allow other float types.

Definition Classes
Optimizer
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
val useLocking: Boolean

If true, the gradient descent updates will be protected by a lock.
If true, the gradient descent updates will be protected by a lock. Otherwise, the behavior is undefined, but may exhibit less contention.

Definition Classes
YellowFin → GradientDescent → Optimizer
val useNesterov: Boolean

Boolean value indicating whether to use Nesterov acceleration or not.
Boolean value indicating whether to use Nesterov acceleration or not. For details, refer to [Sutskever et. al., 2013](http://proceedings.mlr.press/v28/sutskever13.pdf).

Definition Classes
YellowFin → GradientDescent
final def variables: Seq[variables.Variable]

Returns a sequence of variables which encode the current state of this optimizer.
Returns a sequence of variables which encode the current state of this optimizer. The returned variables include both slot variables and non-slot global variables created by this optimizer, in the current graph.

Definition Classes
Optimizer
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
def yellowFinUpdate(gradientsAndVariables: Seq[(OutputLike, variables.Variable)]): Op

Attributes
protected
val zeroDebias: Boolean

If true, the moving averages will be zero-debiased.
final def zerosSlot(name: String, variable: variables.Variable, variableScope: String): variables.Variable

Gets an existing slot or creates a new one using an initial value of zeros, if none exists.
Gets an existing slot or creates a new one using an initial value of zeros, if none exists.
name
Slot name.
variable
Slot primary variable.
variableScope
Name to use when scoping the variable that needs to be created for the slot.
returns
Requested slot variable.

Attributes
protected
Definition Classes
Optimizer

Related Docs: object YellowFin | package optimizers

class YellowFin extends GradientDescent

Instance Constructors

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def applyDense(gradient: Output, variable: variables.Variable, iteration: Option[variables.Variable]): Op

def applyGradients(gradientsAndVariables: Seq[(OutputLike, variables.Variable)], iteration: Option[variables.Variable] = None, name: String = this.name): Op

def applySparse(gradient: OutputIndexedSlices, variable: variables.Variable, iteration: Option[variables.Variable]): Op

def applySparseDuplicateIndices(gradient: OutputIndexedSlices, variable: variables.Variable, iteration: Option[variables.Variable]): Op

final def asInstanceOf[T0]: T0

val beta: Float

var betaTensor: Output

def clone(): AnyRef

def createSlots(variables: Seq[variables.Variable]): Unit

def curvatureRange(gradNormSquaredSum: Output, sparsityAvg: Option[Output]): (Output, Output)

var curvatureWindow: variables.Variable

val curvatureWindowWidth: Int

val decay: Schedule

def distanceToOptimum(gradNormSquaredSum: Output, gradNormSquaredAvg: Output, sparsityAvg: Option[Output]): Output

var doTune: Output

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

def finish(updateOps: Set[Op], nameScope: String): Op

final def getClass(): Class[_]

def getLearningRate(variable: variables.Variable, iteration: Option[variables.Variable]): Output

def getMomentum(variable: variables.Variable): Output

final def getNonSlotVariable(name: String, graph: core.Graph = null): variables.Variable

final def getNonSlotVariables: Iterable[variables.Variable]

final def getOrCreateNonSlotVariable(name: String, initialValue: tensors.Tensor[_ <: types.DataType], colocationOps: Set[Op] = Set.empty, ignoreExisting: Boolean = false): variables.Variable

final def getSlot(name: String, variable: variables.Variable): variables.Variable

final def getSlot(name: String, variable: variables.Variable, initializer: Initializer, shape: core.Shape, dataType: types.DataType, variableScope: String): variables.Variable

def gradientsSparsity(gradients: Seq[OutputLike]): Option[Output]

def gradientsVariance(gradients: Seq[OutputLike], gradNormSquaredAvg: Output, sparsityAvg: Option[Output]): Output

def hashCode(): Int

val ignoreDuplicateSparseIndices: Boolean

var incrementStepOp: Op

final def isInstanceOf[T0]: Boolean

val learningRate: Float

var learningRateFactorVariable: variables.Variable

val learningRateSummaryTag: String

var learningRateTensor: Output

var learningRateVariable: variables.Variable

val momentum: Float

var momentumTensor: Output

var momentumVariable: variables.Variable

var movingAverage: ExponentialMovingAverage

val name: String

final def ne(arg0: AnyRef): Boolean

final val nonSlotVariables: Map[(String, Option[core.Graph]), variables.Variable]

final def notify(): Unit

final def notifyAll(): Unit

def prepare(iteration: Option[variables.Variable]): Unit

final def slotNames: Set[String]

final val slots: Map[String, Map[variables.Variable, variables.Variable]]

val sparsityDebias: Boolean

var step: variables.Variable

val supportedDataTypes: Set[types.DataType]

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

val useLocking: Boolean

val useNesterov: Boolean

final def variables: Seq[variables.Variable]

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

def yellowFinUpdate(gradientsAndVariables: Seq[(OutputLike, variables.Variable)]): Op

val zeroDebias: Boolean

final def zerosSlot(name: String, variable: variables.Variable, variableScope: String): variables.Variable

Inherited from GradientDescent

Inherited from Optimizer

Inherited from AnyRef

Inherited from Any

Ungrouped