Empirical risk of the trained model (matrix factorization).
Empirical risk of the trained model (matrix factorization).
Reference data
Additional parameters for the empirical risk calculation
Evaluates the testing data by computing the prediction value and returning a pair of true label value and prediction value.
Evaluates the testing data by computing the prediction value and returning a pair of true label value and prediction value. It is important that the implementation chooses a Testing type from which it can extract the true label value.
Fits the estimator to the given input data.
Fits the estimator to the given input data. The fitting logic is contained in the FitOperation. The computed state will be stored in the implementing class.
Type of the training data
Training data
Additional parameters for the FitOperation
FitOperation which encapsulates the algorithm logic
Predict testing data according the learned model.
Predict testing data according the learned model. The implementing class has to provide a corresponding implementation of PredictDataSetOperation which contains the prediction logic.
Type of the testing data
Type of the prediction data
Testing data which shall be predicted
Additional parameters for the prediction
PredictDataSetOperation which encapsulates the prediction logic
Sets the number of blocks into which the user and item matrix shall be partitioned
Sets the number of iterations of the ALS algorithm
Sets the regularization coefficient lambda
Sets the number of latent factors/row dimension of the latent model
Sets the random seed for the initial item matrix initialization
Sets the temporary path into which intermediate results are written in order to increase performance.
Alternating least squares algorithm to calculate a matrix factorization.
Given a matrix
R
, ALS calculates two matricesU
andV
such thatR ~~ U^TV
. The unknown row dimension is given by the number of latent factors. Since matrix factorization is often used in the context of recommendation, we'll call the first matrix the user and the second matrix the item matrix. The
ith column of the user matrix is
u_iand the
ith column of the item matrix is
v_i. The matrix
Ris called the ratings matrix and
(R)_{i,j} = r_{i,j}
.In order to find the user and item matrix, the following problem is solved:
argmin_{U,V} sum_(i,j\ with\ r_{i,j} != 0) (r_{i,j} - u_{i}Tv_{j})2 + lambda (sum_(i) n_{u_i} ||u_i||2 + sum_(j) n_{v_j} ||v_j||2)
with
\lambda
being the regularization factor,n_{u_i}
being the number of items the useri
has rated andn_{v_j}
being the number of times the itemj
has been rated. This regularization scheme to avoid overfitting is called weighted-lambda-regularization. Details can be found in the work of Zhou et al..By fixing one of the matrices
U
orV
one obtains a quadratic form which can be solved. The solution of the modified problem is guaranteed to decrease the overall cost function. By applying this step alternately to the matricesU
andV
, we can iteratively improve the matrix factorization.The matrix
R
is given in its sparse representation as a tuple of(i, j, r)
wherei
is the row index,j
is the column index andr
is the matrix value at position(i,j)
.Parameters