regression

Type Members

class MultipleLinearRegression extends Predictor[MultipleLinearRegression]

Multiple linear regression using the ordinary least squares (OLS) estimator.
Multiple linear regression using the ordinary least squares (OLS) estimator.
The linear regression finds a solution to the problem
y = w_0 + w_1*x_1 + w_2*x_2 ... + w_n*x_n = w_0 + w^T*x
such that the sum of squared residuals is minimized
min_{w, w_0} \sum (y - w^{T*x - w_0)}2
The minimization problem is solved by (stochastic) gradient descent. For each labeled vector (x,y), the gradient is calculated. The weighted average of all gradients is subtracted from the current value w which gives the new value of w_new. The weight is defined as stepsize/math.sqrt(iteration).
The optimization runs at most a maximum number of iterations or, if a convergence threshold has been set, until the convergence criterion has been met. As convergence criterion the relative change of the sum of squared residuals is used:
(S_{k-1} - S_k)/S_{k-1} < \rho
with S_k being the sum of squared residuals in iteration k and \rho being the convergence threshold.
At the moment, the whole partition is used for SGD, making it effectively a batch gradient descent. Once a sampling operator has been introduced, the algorithm can be optimized.
Example:
1. val mlr = MultipleLinearRegression() .setIterations(10) .setStepsize(0.5) .setConvergenceThreshold(0.001) val trainingDS: DataSet[LabeledVector] = ... val testingDS: DataSet[Vector] = ... mlr.fit(trainingDS) val predictions = mlr.predict(testingDS)
  Parameters
  org.apache.flink.ml.regression.MultipleLinearRegression.Iterations: Maximum number of iterations.
  org.apache.flink.ml.regression.MultipleLinearRegression.Stepsize: Initial step size for the gradient descent method. This value controls how far the gradient descent method moves in the opposite direction of the gradient. Tuning this parameter might be crucial to make it stable and to obtain a better performance.
  org.apache.flink.ml.regression.MultipleLinearRegression.ConvergenceThreshold: Threshold for relative change of sum of squared residuals until convergence.
  LearningRateMethodTrait: The method used to calculate the effective learning rate for each iteration step. See LearningRateMethod for all supported methods.

Value Members

object MultipleLinearRegression

package regression

Type Members

class MultipleLinearRegression extends Predictor[MultipleLinearRegression]

Parameters

Value Members

object MultipleLinearRegression

Ungrouped