breeze.optimize.AdaptiveGradientDescent
Choose a step size scale for this iteration.
Choose a step size scale for this iteration.
Default is eta / math.pow(state.iter + 1,2.0 / 3.0)
Take this many steps and then reset x to the original x.
Take this many steps and then reset x to the original x. Mostly for Adagrad, which can behave oddly until it has a good sense of rate of change.
Projects the vector x onto whatever ball is needed.
Projects the vector x onto whatever ball is needed. Can also incorporate regularization, or whatever.
Default just takes a step
Implements the L2 regularization update.
Each step is:
x_{t+1}i = (s_{ti} * x_{ti} - \eta * g_ti) / (eta * regularization + delta + s_ti)
where g_ti is the gradient and s_ti = \sqrt(\sum_t'{t} g_ti2)