ALS

Alternating least squares algorithm to calculate a matrix factorization.

Given a matrix R, ALS calculates two matrices U and V such that R ~~ U^TV. The unknown row dimension is given by the number of latent factors. Since matrix factorization is often used in the context of recommendation, we'll call the first matrix the user and the second matrix the item matrix. The ith column of the user matrix is u_i and the ith column of the item matrix is v_i. The matrix R is called the ratings matrix and (R)_{i,j} = r_{i,j}.

In order to find the user and item matrix, the following problem is solved:

argmin_{U,V} sum_(i,j\ with\ r_{i,j} != 0) (r_{i,j} - u_{i}^Tv_{j})2 + lambda (sum_(i) n_{u_i} ||u_i||^{2 + sum_(j) n_{v_j} ||v_j||}2)

with \lambda being the regularization factor, n_{u_i} being the number of items the user i has rated and n_{v_j} being the number of times the item j has been rated. This regularization scheme to avoid overfitting is called weighted-lambda-regularization. Details can be found in the work of Zhou et al..

By fixing one of the matrices U or V one obtains a quadratic form which can be solved. The solution of the modified problem is guaranteed to decrease the overall cost function. By applying this step alternately to the matrices U and V, we can iteratively improve the matrix factorization.

The matrix R is given in its sparse representation as a tuple of (i, j, r) where i is the row index, j is the column index and r is the matrix value at position (i,j).

Example:

```
val inputDS: DataSet[(Int, Int, Double)] = env.readCsvFile[(Int, Int, Double)](
  pathToTrainingFile)
val als = ALS()
  .setIterations(10)
  .setNumFactors(10)
als.fit(inputDS)
val data2Predict: DataSet[(Int, Int)] = env.readCsvFile[(Int, Int)](pathToData)
als.predict(data2Predict)
```
Parameters
- org.apache.flink.ml.recommendation.ALS.NumFactors: The number of latent factors. It is the dimension of the calculated user and item vectors. (Default value: 10)
- org.apache.flink.ml.recommendation.ALS.Lambda: Regularization factor. Tune this value in order to avoid overfitting/generalization. (Default value: 1)
- org.apache.flink.ml.regression.MultipleLinearRegression.Iterations: The number of iterations to perform. (Default value: 10)
- org.apache.flink.ml.recommendation.ALS.Blocks: The number of blocks into which the user and item matrix a grouped. The fewer blocks one uses, the less data is sent redundantly. However, bigger blocks entail bigger update messages which have to be stored on the Heap. If the algorithm fails because of an OutOfMemoryException, then try to increase the number of blocks. (Default value: None)
- org.apache.flink.ml.recommendation.ALS.Seed: Random seed used to generate the initial item matrix for the algorithm. (Default value: 0)
- org.apache.flink.ml.recommendation.ALS.TemporaryPath: Path to a temporary directory into which intermediate results are stored. If this value is set, then the algorithm is split into two preprocessing steps, the ALS iteration and a post-processing step which calculates a last ALS half-step. The preprocessing steps calculate the org.apache.flink.ml.recommendation.ALS.OutBlockInformation and org.apache.flink.ml.recommendation.ALS.InBlockInformation for the given rating matrix. The result of the individual steps are stored in the specified directory. By splitting the algorithm into multiple smaller steps, Flink does not have to split the available memory amongst too many operators. This allows the system to process bigger individual messasges and improves the overall performance. (Default value: None) The ALS implementation is based on Spark's MLLib implementation of ALS which you can find recommendation/ALS.scala here.

Linear Supertypes

Predictor[ALS], Estimator[ALS], WithParameters, AnyRef, Any

Instance Constructors

new ALS()

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def empiricalRisk(labeledData: DataSet[(Long, Long, Double)], riskParameters: ParameterMap = ParameterMap.Empty): DataSet[Double]

Empirical risk of the trained model (matrix factorization).
Empirical risk of the trained model (matrix factorization).
labeledData
Reference data
riskParameters
Additional parameters for the empirical risk calculation
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def evaluate[Testing, PredictionValue](testing: DataSet[Testing], evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit evaluator: EvaluateDataSetOperation[ALS, Testing, PredictionValue]): DataSet[(PredictionValue, PredictionValue)]

Evaluates the testing data by computing the prediction value and returning a pair of true label value and prediction value.
Evaluates the testing data by computing the prediction value and returning a pair of true label value and prediction value. It is important that the implementation chooses a Testing type from which it can extract the true label value.

Definition Classes
Predictor
var factorsOption: Option[(DataSet[Factors], DataSet[Factors])]
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def fit[Training](training: DataSet[Training], fitParameters: ParameterMap = ParameterMap.Empty)(implicit fitOperation: FitOperation[ALS, Training]): Unit

Fits the estimator to the given input data.
Fits the estimator to the given input data. The fitting logic is contained in the FitOperation. The computed state will be stored in the implementing class.
Training
Type of the training data
training
Training data
fitParameters
Additional parameters for the FitOperation
fitOperation
FitOperation which encapsulates the algorithm logic

Definition Classes
Estimator
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val parameters: ParameterMap

Definition Classes
WithParameters
def predict[Testing, Prediction](testing: DataSet[Testing], predictParameters: ParameterMap = ParameterMap.Empty)(implicit predictor: PredictDataSetOperation[ALS, Testing, Prediction]): DataSet[Prediction]

Predict testing data according the learned model.
Predict testing data according the learned model. The implementing class has to provide a corresponding implementation of PredictDataSetOperation which contains the prediction logic.
Testing
Type of the testing data
Prediction
Type of the prediction data
testing
Testing data which shall be predicted
predictParameters
Additional parameters for the prediction
predictor
PredictDataSetOperation which encapsulates the prediction logic

Definition Classes
Predictor
def setBlocks(blocks: Int): ALS

Sets the number of blocks into which the user and item matrix shall be partitioned
def setIterations(iterations: Int): ALS

Sets the number of iterations of the ALS algorithm
def setLambda(lambda: Double): ALS

Sets the regularization coefficient lambda
def setNumFactors(numFactors: Int): ALS

Sets the number of latent factors/row dimension of the latent model
def setSeed(seed: Long): ALS

Sets the random seed for the initial item matrix initialization
def setTemporaryPath(temporaryPath: String): ALS

Sets the temporary path into which intermediate results are written in order to increase performance.
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Docs: object ALS | package recommendation

class ALS extends Predictor[ALS]

Parameters

Instance Constructors

new ALS()

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def empiricalRisk(labeledData: DataSet[(Long, Long, Double)], riskParameters: ParameterMap = ParameterMap.Empty): DataSet[Double]

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def evaluate[Testing, PredictionValue](testing: DataSet[Testing], evaluateParameters: ParameterMap = ParameterMap.Empty)(implicit evaluator: EvaluateDataSetOperation[ALS, Testing, PredictionValue]): DataSet[(PredictionValue, PredictionValue)]

var factorsOption: Option[(DataSet[Factors], DataSet[Factors])]

def finalize(): Unit

def fit[Training](training: DataSet[Training], fitParameters: ParameterMap = ParameterMap.Empty)(implicit fitOperation: FitOperation[ALS, Training]): Unit

final def getClass(): Class[_]

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val parameters: ParameterMap

def predict[Testing, Prediction](testing: DataSet[Testing], predictParameters: ParameterMap = ParameterMap.Empty)(implicit predictor: PredictDataSetOperation[ALS, Testing, Prediction]): DataSet[Prediction]

def setBlocks(blocks: Int): ALS

def setIterations(iterations: Int): ALS

def setLambda(lambda: Double): ALS

def setNumFactors(numFactors: Int): ALS

def setSeed(seed: Long): ALS

def setTemporaryPath(temporaryPath: String): ALS

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from Predictor[ALS]

Inherited from Estimator[ALS]

Inherited from WithParameters

Inherited from AnyRef

Inherited from Any

Ungrouped