Find the k nearest neighbors from a dataset for every other object in the same dataset.
Lsh implementation as described in 'Randomized Algorithms and NLP: Using Locality Sensitive Hash Function for High Speed Noun Clustering' by Ravichandran et al.
Brute force O(n2) method to compute exact nearest neighbours. As this is a very expensive computation O(n2) an additional sample parameter may be passed such that neighbours are just computed for a random fraction.
Represents a RDD from grouping items of its parent RDD in fixed size blocks by passing a sliding window over them.
NOTE: both classes are copied from mllib and slightly modified since these classes are mllib private!
interface defining similarity measurement between 2 vectors
implementation of VectorDisctance that computes cosine similarity between two vectors