public class KNN<T> extends java.lang.Object implements SoftClassifier<T>
The best choice of k depends upon the data; generally, larger values of k reduce the effect of noise on the classification, but make boundaries between classes less distinct. A good k can be selected by various heuristic techniques, e.g. cross-validation. In binary problems, it is helpful to choose k to be an odd number as this avoids tied votes.
A drawback to the basic majority voting classification is that the classes with the more frequent instances tend to dominate the prediction of the new object, as they tend to come up in the k nearest neighbors when the neighbors are computed due to their large number. One way to overcome this problem is to weight the classification taking into account the distance from the test point to each of its k nearest neighbors.
Often, the classification accuracy of k-NN can be improved significantly if the distance metric is learned with specialized algorithms such as Large Margin Nearest Neighbor or Neighborhood Components Analysis.
Nearest neighbor rules in effect compute the decision boundary in an implicit manner. It is also possible to compute the decision boundary itself explicitly, and to do so in an efficient manner so that the computational complexity is a function of the boundary complexity.
The nearest neighbor algorithm has some strong consistency results. As the amount of data approaches infinity, the algorithm is guaranteed to yield an error rate no worse than twice the Bayes error rate (the minimum achievable error rate given the distribution of the data). k-NN is guaranteed to approach the Bayes error rate, for some value of k (where k increases as a function of the number of data points).
Constructor and Description |
---|
KNN(KNNSearch<T,T> knn,
int[] y,
int k)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
static KNN<double[]> |
fit(double[][] x,
int[] y)
Learn the 1-NN classifier.
|
static KNN<double[]> |
fit(double[][] x,
int[] y,
int k)
Learn the K-NN classifier.
|
static <T> KNN<T> |
fit(T[] x,
int[] y,
smile.math.distance.Distance<T> distance)
Learn the 1-NN classifier.
|
static <T> KNN<T> |
fit(T[] x,
int[] y,
smile.math.distance.Distance<T> distance,
int k)
Learn the K-NN classifier.
|
int |
predict(T x)
Predicts the class label of an instance.
|
int |
predict(T x,
double[] posteriori)
Predicts the class label of an instance and also calculate a posteriori
probabilities.
|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
applyAsDouble, applyAsInt, f, predict
public static <T> KNN<T> fit(T[] x, int[] y, smile.math.distance.Distance<T> distance)
x
- training samples.y
- training labels.distance
- the distance measure for finding nearest neighbors.public static <T> KNN<T> fit(T[] x, int[] y, smile.math.distance.Distance<T> distance, int k)
k
- the number of neighbors.x
- training samples.y
- training labels.distance
- the distance measure for finding nearest neighbors.public static KNN<double[]> fit(double[][] x, int[] y)
x
- training samples.y
- training labels.public static KNN<double[]> fit(double[][] x, int[] y, int k)
k
- the number of neighbors for classification.x
- training samples.y
- training labels.public int predict(T x)
Classifier
predict
in interface Classifier<T>
x
- the instance to be classified.public int predict(T x, double[] posteriori)
SoftClassifier
predict
in interface SoftClassifier<T>
x
- an instance to be classified.posteriori
- the array to store a posteriori probabilities on output.