MultipleLinearRegression

Multiple linear regression using the ordinary least squares (OLS) estimator.

The linear regression finds a solution to the problem

y = w_0 + w_1*x_1 + w_2*x_2 ... + w_n*x_n = w_0 + w^T*x

such that the sum of squared residuals is minimized

min_{w, w_0} \sum (y - w^{T*x - w_0)}2

The minimization problem is solved by (stochastic) gradient descent. For each labeled vector (x,y), the gradient is calculated. The weighted average of all gradients is subtracted from the current value w which gives the new value of w_new. The weight is defined as stepsize/math.sqrt(iteration).

The optimization runs at most a maximum number of iterations or, if a convergence threshold has been set, until the convergence criterion has been met. As convergence criterion the relative change of the sum of squared residuals is used:

(S_{k-1} - S_k)/S_{k-1} < \rho

with S_k being the sum of squared residuals in iteration k and \rho being the convergence threshold.

At the moment, the whole partition is used for SGD, making it effectively a batch gradient descent. Once a sampling operator has been introduced, the algorithm can be optimized.

Example:

```
val mlr = MultipleLinearRegression()
  .setIterations(10)
  .setStepsize(0.5)
  .setConvergenceThreshold(0.001)
val trainingDS: DataSet[LabeledVector] = ...
val testingDS: DataSet[Vector] = ...
mlr.fit(trainingDS)
val predictions = mlr.predict(testingDS)
```
Parameters
- org.apache.flink.ml.regression.MultipleLinearRegression.Iterations: Maximum number of iterations.
- org.apache.flink.ml.regression.MultipleLinearRegression.Stepsize: Initial step size for the gradient descent method. This value controls how far the gradient descent method moves in the opposite direction of the gradient. Tuning this parameter might be crucial to make it stable and to obtain a better performance.
- org.apache.flink.ml.regression.MultipleLinearRegression.ConvergenceThreshold: Threshold for relative change of sum of squared residuals until convergence.
- LearningRateMethodTrait: The method used to calculate the effective learning rate for each iteration step. See LearningRateMethod for all supported methods.

Linear Supertypes

Predictor[MultipleLinearRegression], Estimator[MultipleLinearRegression], WithParameters, AnyRef, Any

Instance Constructors

new MultipleLinearRegression()

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def evaluate[Testing, PredictionValue](testing: DataSet[Testing], evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit evaluator: EvaluateDataSetOperation[MultipleLinearRegression, Testing, PredictionValue]): DataSet[(PredictionValue, PredictionValue)]

Evaluates the testing data by computing the prediction value and returning a pair of true label value and prediction value.
Evaluates the testing data by computing the prediction value and returning a pair of true label value and prediction value. It is important that the implementation chooses a Testing type from which it can extract the true label value.

Definition Classes
Predictor
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def fit[Training](training: DataSet[Training], fitParameters: ParameterMap = ParameterMap.Empty)(implicit fitOperation: FitOperation[MultipleLinearRegression, Training]): Unit

Fits the estimator to the given input data.
Fits the estimator to the given input data. The fitting logic is contained in the FitOperation. The computed state will be stored in the implementing class.
Training
Type of the training data
training
Training data
fitParameters
Additional parameters for the FitOperation
fitOperation
FitOperation which encapsulates the algorithm logic

Definition Classes
Estimator
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val parameters: ParameterMap

Definition Classes
WithParameters
def predict[Testing, Prediction](testing: DataSet[Testing], predictParameters: ParameterMap = ParameterMap.Empty)(implicit predictor: PredictDataSetOperation[MultipleLinearRegression, Testing, Prediction]): DataSet[Prediction]

Predict testing data according the learned model.
Predict testing data according the learned model. The implementing class has to provide a corresponding implementation of PredictDataSetOperation which contains the prediction logic.
Testing
Type of the testing data
Prediction
Type of the prediction data
testing
Testing data which shall be predicted
predictParameters
Additional parameters for the prediction
predictor
PredictDataSetOperation which encapsulates the prediction logic

Definition Classes
Predictor
def setConvergenceThreshold(convergenceThreshold: Double): MultipleLinearRegression
def setIterations(iterations: Int): MultipleLinearRegression
def setLearningRateMethod(learningRateMethod: LearningRateMethodTrait): MultipleLinearRegression
def setStepsize(stepsize: Double): MultipleLinearRegression
def squaredResidualSum(input: DataSet[LabeledVector]): DataSet[Double]
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
var weightsOption: Option[DataSet[WeightVector]]

Related Docs: object MultipleLinearRegression | package regression

class MultipleLinearRegression extends Predictor[MultipleLinearRegression]

Parameters

Instance Constructors

new MultipleLinearRegression()

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def evaluate[Testing, PredictionValue](testing: DataSet[Testing], evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit evaluator: EvaluateDataSetOperation[MultipleLinearRegression, Testing, PredictionValue]): DataSet[(PredictionValue, PredictionValue)]

def finalize(): Unit

def fit[Training](training: DataSet[Training], fitParameters: ParameterMap = ParameterMap.Empty)(implicit fitOperation: FitOperation[MultipleLinearRegression, Training]): Unit

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val parameters: ParameterMap

def predict[Testing, Prediction](testing: DataSet[Testing], predictParameters: ParameterMap = ParameterMap.Empty)(implicit predictor: PredictDataSetOperation[MultipleLinearRegression, Testing, Prediction]): DataSet[Prediction]

def setConvergenceThreshold(convergenceThreshold: Double): MultipleLinearRegression

def setIterations(iterations: Int): MultipleLinearRegression

def setLearningRateMethod(learningRateMethod: LearningRateMethodTrait): MultipleLinearRegression

def setStepsize(stepsize: Double): MultipleLinearRegression

def squaredResidualSum(input: DataSet[LabeledVector]): DataSet[Double]

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

var weightsOption: Option[DataSet[WeightVector]]

Inherited from Predictor[MultipleLinearRegression]

Inherited from Estimator[MultipleLinearRegression]

Inherited from WithParameters

Inherited from AnyRef

Inherited from Any

Ungrouped