Created by jda on 3/17/15.
Approximates a gradient by finite differences.
A line search optimizes a function of one variable without analytic gradient information.
A line search optimizes a function of one variable without analytic gradient information. It's often used approximately (e.g. in backtracking line search), where there is no intrinsic termination criterion, only extrinsic
Implements the Backtracking Linesearch like that in LBFGS-C (which is (c) 2007-2010 Naoaki Okazaki under BSD)
Implements the Backtracking Linesearch like that in LBFGS-C (which is (c) 2007-2010 Naoaki Okazaki under BSD)
Basic idea is that we need to find an alpha that is sufficiently smaller than f(0), and also possibly requiring that the slope of f decrease by the right amount (wolfe conditions)
A diff function that supports subsets of the data.
A diff function that supports subsets of the data. By default it evaluates on all the data
Represents a differentiable function whose output is guaranteed to be consistent
The empirical hessian evaluates the derivative for multiplcation.
The empirical hessian evaluates the derivative for multiplcation.
H * d = \lim_e -> 0 (f'(x + e * d) - f'(x))/e
The Fisher matrix approximates the Hessian by E[grad grad'].
The Fisher matrix approximates the Hessian by E[grad grad']. We further approximate this with a monte carlo approximation to the expectation.
Port of LBFGS to Scala.
Port of LBFGS to Scala.
Special note for LBFGS: If you use it in published work, you must cite one of: * J. Nocedal. Updating Quasi-Newton Matrices with Limited Storage (1980), Mathematics of Computation 35, pp. 773-782. * D.C. Liu and J. Nocedal. On the Limited mem Method for Large Scale Optimization (1989), Mathematical Programming B, 45, 3, pp. 503-528.
This algorithm is refered the paper "A LIMITED MEMOR Y ALGORITHM F OR BOUND CONSTRAINED OPTIMIZA TION" written by Richard H.Byrd Peihuang Lu Jorge Nocedal and Ciyou Zhu Created by fanming.chen on 2015/3/7 0007.
This algorithm is refered the paper "A LIMITED MEMOR Y ALGORITHM F OR BOUND CONSTRAINED OPTIMIZA TION" written by Richard H.Byrd Peihuang Lu Jorge Nocedal and Ciyou Zhu Created by fanming.chen on 2015/3/7 0007. If StrongWolfeLineSearch(maxZoomIter,maxLineSearchIter) is small, the wolfeRuleSearch.minimize may throw FirstOrderException, it should increase the two variables to appropriate value
A line search optimizes a function of one variable without analytic gradient information.
A line search optimizes a function of one variable without analytic gradient information. Differs only in whether or not it tries to find an exact minimizer
Anything that can minimize a function
Implements the Orthant-wise Limited Memory QuasiNewton method, which is a variant of LBFGS that handles L1 regularization.
Implements the Orthant-wise Limited Memory QuasiNewton method, which is a variant of LBFGS that handles L1 regularization.
Paper is Andrew and Gao (2007) Scalable Training of L1-Regularized Log-Linear Models
Represents a function for which we can easily compute the Hessian.
Represents a function for which we can easily compute the Hessian.
For conjugate gradient methods, you can play tricks with the hessian, returning an object that only supports multiplication.
SPG is a Spectral Projected Gradient minimizer; it minimizes a differentiable function subject to the optimum being in some set, given by the projection operator projection
SPG is a Spectral Projected Gradient minimizer; it minimizes a differentiable function subject to the optimum being in some set, given by the projection operator projection
vector type
A differentiable function whose output is not guaranteed to be the same across consecutive invocations.
Minimizes a function using stochastic gradient descent
Implements a TruncatedNewton Trust region method (like Tron).
Implements a TruncatedNewton Trust region method (like Tron). Also implements "Hessian Free learning". We have a few extra tricks though... :)
Implements the L2^2 and L1 updates from Duchi et al 2010 Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.
Implements the L2^2 and L1 updates from Duchi et al 2010 Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.
Basically, we use "forward regularization" and an adaptive step size based on the previous gradients.
Class that compares the computed gradient with an empirical gradient based on finite differences.
Class that compares the computed gradient with an empirical gradient based on finite differences. Essential for debugging dynamic programs.
Root finding algorithms
Returns a sequence of states representing the iterates of a solver, given an breeze.optimize.IterableOptimizationPackage that knows how to minimize The actual state class varies with the kind of function passed in.
Returns a sequence of states representing the iterates of a solver, given an breeze.optimize.IterableOptimizationPackage that knows how to minimize The actual state class varies with the kind of function passed in. Typically, they have a .x value of type Vector that is the current point being evaluated, and .value is the current objective value
Minimizes a function, given an breeze.optimize.OptimizationPackage that knows how to minimize