Any history the derived minimization function needs to do its updates.
Any history the derived minimization function needs to do its updates. typically an approximation to the second derivative/hessian matrix.
Tracks the information about the optimizer, including the current point, its value, gradient, and then any history.
Tracks the information about the optimizer, including the current point, its value, gradient, and then any history. Also includes information for checking convergence.
the current point being considered
f(x)
f.gradientAt(x)
f(x) + r(x), where r is any regularization added to the objective. For LBFGS, this is f(x).
f'(x) + r'(x), where r is any regularization added to the objective. For LBFGS, this is f'(x).
what iteration number we are on.
f(x_0) + r(x_0), used for checking convergence
any information needed by the optimizer to do updates.
the sequence of the last minImprovementWindow values, used for checking if the "value" isn't improving
the number of times in a row the objective hasn't improved, mostly for SGD
did the line search fail?
Given a direction, perform a Strong Wolfe Line Search
Given a direction, perform a Strong Wolfe Line Search
TO DO: Compare performance with Cubic Interpolation based line search from Mark's PQN paper
the current state
The objective
The step direction
stepSize
How many iterations to improve function by at least improvementTol
How many iterations to improve function by at least improvementTol