org.apache.flink.ml.regression
Evaluates the testing data by computing the prediction value and returning a pair of true label value and prediction value.
Evaluates the testing data by computing the prediction value and returning a pair of true label value and prediction value. It is important that the implementation chooses a Testing type from which it can extract the true label value.
Fits the estimator to the given input data.
Fits the estimator to the given input data. The fitting logic is contained in the FitOperation. The computed state will be stored in the implementing class.
Type of the training data
Training data
Additional parameters for the FitOperation
FitOperation which encapsulates the algorithm logic
Predict testing data according the learned model.
Predict testing data according the learned model. The implementing class has to provide a corresponding implementation of PredictDataSetOperation which contains the prediction logic.
Type of the testing data
Type of the prediction data
Testing data which shall be predicted
Additional parameters for the prediction
PredictDataSetOperation which encapsulates the prediction logic
Multiple linear regression using the ordinary least squares (OLS) estimator.
The linear regression finds a solution to the problem
y = w_0 + w_1*x_1 + w_2*x_2 ... + w_n*x_n = w_0 + w^T*x
such that the sum of squared residuals is minimized
min_{w, w_0} \sum (y - wT*x - w_0)2
The minimization problem is solved by (stochastic) gradient descent. For each labeled vector
(x,y)
, the gradient is calculated. The weighted average of all gradients is subtracted from the current valuew
which gives the new value ofw_new
. The weight is defined asstepsize/math.sqrt(iteration)
.The optimization runs at most a maximum number of iterations or, if a convergence threshold has been set, until the convergence criterion has been met. As convergence criterion the relative change of the sum of squared residuals is used:
(S_{k-1} - S_k)/S_{k-1} < \rho
with S_k being the sum of squared residuals in iteration k and
\rho
being the convergence threshold.At the moment, the whole partition is used for SGD, making it effectively a batch gradient descent. Once a sampling operator has been introduced, the algorithm can be optimized.
Parameters