Generic loss function which lets you build a loss function out of the PartialLossFunction and the PredictionFunction.
Base class which performs Stochastic Gradient Descent optimization using mini batches.
An abstract class for iterative optimization algorithms
An abstract class for iterative optimization algorithms
See Iterative Methods on Wikipedia for more info
Abstract class that implements some of the functionality for common loss functions
Abstract class that implements some of the functionality for common loss functions
A loss function determines the loss term L(w)
of the objective function f(w) = L(w) +
lambda*R(w)
for prediction tasks, the other being regularization, R(w)
.
The regularization is specific to the used optimization algorithm and, thus, implemented there.
We currently only support differentiable loss functions, in the future this class could be changed to DiffLossFunction in order to support other types, such as absolute loss.
Represents loss functions which can be used with the GenericLossFunction.
An abstract class for prediction functions to be used in optimization *
Represents a type of regularization penalty
Represents a type of regularization penalty
Regularization penalties are used to restrict the optimization problem to solutions with certain desirable characteristics, such as sparsity for the L1 penalty, or penalizing large weights for the L2 penalty.
The regularization term, R(w)
is added to the objective function, f(w) = L(w) + lambda*R(w)
where lambda is the regularization parameter used to tune the amount of regularization applied.
Base class for optimization algorithms
Implementation of a Gradient Descent solver.
Hinge loss function which can be used with the GenericLossFunction
Hinge loss function which can be used with the GenericLossFunction
The HingeLoss function implements max(0, 1 - prediction*label)
for binary classification with label in {-1, 1}
L_1
regularization penalty.
L_1
regularization penalty.
The regularization function is the L1
norm ||w||_1
with w
being the weight vector.
The L_1
penalty can be used to drive a number of the solution coefficients to 0, thereby
producing sparse solutions.
L_2
regularization penalty.
L_2
regularization penalty.
The regularization function is the square of the L2 norm 1/2*||w||_2^2
with
w being the weight vector. The function penalizes large weights,
favoring solutions with more small weights rather than few large ones.
A linear prediction function *
Logistic loss function which can be used with the GenericLossFunction
Logistic loss function which can be used with the GenericLossFunction
The LogisticLoss function implements log(1 + -exp(prediction*label))
for binary classification with label in {-1, 1}
No regularization penalty.
Squared loss function which can be used with the GenericLossFunction
Squared loss function which can be used with the GenericLossFunction
The SquaredLoss function implements 1/2 (prediction - label)^2
Base class which performs Stochastic Gradient Descent optimization using mini batches.
For each labeled vector in a mini batch the gradient is computed and added to a partial gradient. The partial gradients are then summed and divided by the size of the batches. The average gradient is then used to updated the weight values, including regularization.
At the moment, the whole partition is used for SGD, making it effectively a batch gradient descent. Once a sampling operator has been introduced, the algorithm can be optimized
The parameters to tune the algorithm are: Solver.LossFunction for the loss function to be used, Solver.RegularizationPenaltyValue for the regularization penalty. Solver.RegularizationConstant for the regularization parameter, IterativeSolver.Iterations for the maximum number of iteration, IterativeSolver.LearningRate for the learning rate used. IterativeSolver.ConvergenceThreshold when provided the algorithm will stop the iterations if the relative change in the value of the objective function between successive iterations is is smaller than this value. IterativeSolver.LearningRateMethodValue determines functional form of effective learning rate.