such that the sum of squared residuals is minimized
min_{w, w_0} \sum (y - wT*x - w_0)2
The minimization problem is solved by (stochastic) gradient descent. For each labeled vector
(x,y), the gradient is calculated. The weighted average of all gradients is subtracted from
the current value w which gives the new value of w_new. The weight is defined as
stepsize/math.sqrt(iteration).
The optimization runs at most a maximum number of iterations or, if a convergence threshold has
been set, until the convergence criterion has been met. As convergence criterion the relative
change of the sum of squared residuals is used:
(S_{k-1} - S_k)/S_{k-1} < \rho
with S_k being the sum of squared residuals in iteration k and \rho being the convergence
threshold.
At the moment, the whole partition is used for SGD, making it effectively a batch gradient
descent. Once a sampling operator has been introduced, the algorithm can be optimized.
Example:
val mlr = MultipleLinearRegression()
.setIterations(10)
.setStepsize(0.5)
.setConvergenceThreshold(0.001)
val trainingDS: DataSet[LabeledVector] = ...
val testingDS: DataSet[Vector] = ...
mlr.fit(trainingDS)
val predictions = mlr.predict(testingDS)
org.apache.flink.ml.regression.MultipleLinearRegression.Stepsize:
Initial step size for the gradient descent method.
This value controls how far the gradient descent method moves in the opposite direction of the
gradient. Tuning this parameter might be crucial to make it stable and to obtain a better
performance.
LearningRateMethodTrait:
The method used to calculate the effective learning rate for each iteration step. See
LearningRateMethod for all supported methods.
Multiple linear regression using the ordinary least squares (OLS) estimator.
The linear regression finds a solution to the problem
y = w_0 + w_1*x_1 + w_2*x_2 ... + w_n*x_n = w_0 + w^T*x
such that the sum of squared residuals is minimized
min_{w, w_0} \sum (y - wT*x - w_0)2
The minimization problem is solved by (stochastic) gradient descent. For each labeled vector
(x,y)
, the gradient is calculated. The weighted average of all gradients is subtracted from the current valuew
which gives the new value ofw_new
. The weight is defined asstepsize/math.sqrt(iteration)
.The optimization runs at most a maximum number of iterations or, if a convergence threshold has been set, until the convergence criterion has been met. As convergence criterion the relative change of the sum of squared residuals is used:
(S_{k-1} - S_k)/S_{k-1} < \rho
with S_k being the sum of squared residuals in iteration k and
\rho
being the convergence threshold.At the moment, the whole partition is used for SGD, making it effectively a batch gradient descent. Once a sampling operator has been introduced, the algorithm can be optimized.
Parameters