Any history the derived minimization function needs to do its updates.
Any history the derived minimization function needs to do its updates. typically an approximation to the second derivative/hessian matrix.
Tracks the information about the optimizer, including the current point, its value, gradient, and then any history.
Choose a step size scale for this iteration.
Choose a step size scale for this iteration.
Default is eta / math.pow(state.iter + 1,2.0 / 3.0)
Projects the vector x onto whatever ball is needed.
Projects the vector x onto whatever ball is needed. Can also incorporate regularization, or whatever.
Default just takes a step
Minimizes a function using stochastic gradient descent