Evaluates the testing data by computing the prediction value and returning a pair of true label value and prediction value.
Evaluates the testing data by computing the prediction value and returning a pair of true label value and prediction value. It is important that the implementation chooses a Testing type from which it can extract the true label value.
Fits the estimator to the given input data.
Fits the estimator to the given input data. The fitting logic is contained in the FitOperation. The computed state will be stored in the implementing class.
Type of the training data
Training data
Additional parameters for the FitOperation
FitOperation which encapsulates the algorithm logic
Predict testing data according the learned model.
Predict testing data according the learned model. The implementing class has to provide a corresponding implementation of PredictDataSetOperation which contains the prediction logic.
Type of the testing data
Type of the prediction data
Testing data which shall be predicted
Additional parameters for the prediction
PredictDataSetOperation which encapsulates the prediction logic
Sets the number of data blocks/partitions
Sets the number of data blocks/partitions
the number of data blocks
Sets the distance metric
Sets the distance metric
the distance metric to calculate distance between two points
Sets K
Sets K
the number of selected points as neighbors
Parameter a user can specify if one of the training or test sets are small
Parameter a user can specify if one of the training or test sets are small
cross hint tells the system which sizes to expect from the data sets
Sets the Boolean variable that decides whether to use the QuadTree or not
Implements a
k
-nearest neighbor join.Calculates the
k
-nearest neighbor points in the training set for each point in the test set.Parameters
- org.apache.flink.ml.nn.KNN.K Sets the K which is the number of selected points as neighbors. (Default value: 5) - org.apache.flink.ml.nn.KNN.DistanceMetric Sets the distance metric we use to calculate the distance between two points. If no metric is specified, then org.apache.flink.ml.metrics.distances.EuclideanDistanceMetric is used. (Default value: EuclideanDistanceMetric()) - org.apache.flink.ml.nn.KNN.Blocks Sets the number of blocks into which the input data will be split. This number should be set at least to the degree of parallelism. If no value is specified, then the parallelism of the input DataSet is used as the number of blocks. (Default value: None) - org.apache.flink.ml.nn.KNN.UseQuadTree A boolean variable that whether or not to use a quadtree to partition the training set to potentially simplify the KNN search. If no value is specified, the code will automatically decide whether or not to use a quadtree. Use of a quadtree scales well with the number of training and testing points, though poorly with the dimension. (Default value: None) - org.apache.flink.ml.nn.KNN.SizeHint Specifies whether the training set or test set is small to optimize the cross product operation needed for the KNN search. If the training set is small this should be
CrossHint.FIRST_IS_SMALL
and set toCrossHint.SECOND_IS_SMALL
if the test set is small. (Default value: None)