Implements the L1 regularization update.
Implements the L1 regularization update.
Each step is:
x_{t+1}i = sign(x_{t,i} - eta/s_i * g_ti) * (abs(x_ti - eta/s_ti * g_ti) - lambda * eta /s_ti))_+
where g_ti is the gradient and s_ti = \sqrt(\sum_t'{t} g_ti2)
Implements the L2 regularization update.
Implements the L2 regularization update.
Each step is:
x_{t+1}i = (s_{ti} * x_{ti} - \eta * g_ti) / (eta * regularization + delta + s_ti)
where g_ti is the gradient and s_ti = \sqrt(\sum_t'{t} g_ti2)
Implements the L2^2 and L1 updates from Duchi et al 2010 Adaptive Subgradient Methods for Online Learning and Stochastic Optimization.
Basically, we use "forward regularization" and an adaptive step size based on the previous gradients.