Any history the derived minimization function needs to do its updates.
Any history the derived minimization function needs to do its updates. typically an approximation to the second derivative/hessian matrix.
Tracks the information about the optimizer, including the current point, its value, gradient, and then any history.
Tracks the information about the optimizer, including the current point, its value, gradient, and then any history. Also includes information for checking convergence.
the current point being considered
f(x)
f.gradientAt(x)
f(x) + r(x), where r is any regularization added to the objective. For LBFGS, this is f(x).
f'(x) + r'(x), where r is any regularization added to the objective. For LBFGS, this is f'(x).
what iteration number we are on.
f(x_0) + r(x_0), used for checking convergence
any information needed by the optimizer to do updates.
the sequence of the last minImprovementWindow values, used for checking if the "value" isn't improving
the number of times in a row the objective hasn't improved, mostly for SGD
did the line search fail?
Choose a step size scale for this iteration.
Choose a step size scale for this iteration.
Default is eta / math.pow(state.iter + 1,2.0 / 3.0)
How many iterations to improve function by at least improvementTol
How many iterations to improve function by at least improvementTol
Projects the vector x onto whatever ball is needed.
Projects the vector x onto whatever ball is needed. Can also incorporate regularization, or whatever.
Default just takes a step
Minimizes a function using stochastic gradient descent