Package

org.apache.flink.ml

regression

Permalink

package regression

Visibility
  1. Public
  2. All

Type Members

  1. class MultipleLinearRegression extends Predictor[MultipleLinearRegression]

    Permalink

    Multiple linear regression using the ordinary least squares (OLS) estimator.

    Multiple linear regression using the ordinary least squares (OLS) estimator.

    The linear regression finds a solution to the problem

    y = w_0 + w_1*x_1 + w_2*x_2 ... + w_n*x_n = w_0 + w^T*x

    such that the sum of squared residuals is minimized

    min_{w, w_0} \sum (y - wT*x - w_0)2

    The minimization problem is solved by (stochastic) gradient descent. For each labeled vector (x,y), the gradient is calculated. The weighted average of all gradients is subtracted from the current value w which gives the new value of w_new. The weight is defined as stepsize/math.sqrt(iteration).

    The optimization runs at most a maximum number of iterations or, if a convergence threshold has been set, until the convergence criterion has been met. As convergence criterion the relative change of the sum of squared residuals is used:

    (S_{k-1} - S_k)/S_{k-1} < \rho

    with S_k being the sum of squared residuals in iteration k and \rho being the convergence threshold.

    At the moment, the whole partition is used for SGD, making it effectively a batch gradient descent. Once a sampling operator has been introduced, the algorithm can be optimized.

    Example:
    1. val mlr = MultipleLinearRegression()
        .setIterations(10)
        .setStepsize(0.5)
        .setConvergenceThreshold(0.001)
      val trainingDS: DataSet[LabeledVector] = ...
      val testingDS: DataSet[Vector] = ...
      mlr.fit(trainingDS)
      val predictions = mlr.predict(testingDS)

      Parameters

Value Members

  1. object MultipleLinearRegression

    Permalink

Ungrouped