com.soundcloud

lsh

package lsh

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By inheritance
Inherited
  1. lsh
  2. AnyRef
  3. Any
  1. Hide All
  2. Show all
Learn more about member selection
Visibility
  1. Public
  2. All

Type Members

  1. trait Joiner extends AnyRef

  2. class Lsh extends Joiner with Serializable

    Lsh implementation as described in 'Randomized Algorithms and NLP: Using Locality Sensitive Hash Function for High Speed Noun Clustering' by Ravichandran et al.

  3. class NearestNeighbours extends Joiner with Serializable

    Brute force O(n2) method to compute exact nearest neighbours. As this is a very expensive computation O(n2) an additional sample parameter may be passed such that neighbours are just computed for a random fraction.

  4. class QueryHamming extends QueryJoiner with Serializable

    Implementation based on approximated cosine distances.

  5. trait QueryJoiner extends AnyRef

  6. class QueryLsh extends QueryJoiner with Serializable

    Standard Lsh implementation.

  7. class QueryNearestNeighbours extends QueryJoiner with Serializable

    Brute force O(size(query) * size(catalog)) method to compute exact nearest neighbours for rows in the query matrix.

  8. final case class Signature(index: Long, vector: Vector, bitSet: BitSet) extends Ordered[Signature] with Product with Serializable

    An id with it's hash encoding and original vector.

  9. class SlidingRDD[T] extends RDD[Seq[T]]

    Represents a RDD from grouping items of its parent RDD in fixed size blocks by passing a sliding window over them.

  10. class SlidingRDDPartition[T] extends Partition with Serializable

    NOTE: both classes are copied from mllib and slightly modified since these classes are mllib private!

  11. final case class SparseSignature(index: Long, bitSet: BitSet) extends Ordered[SparseSignature] with Product with Serializable

    An id with it's hash encoding.

  12. trait VectorDistance extends Serializable

    interface defining similarity measurement between 2 vectors

Value Members

  1. object Cosine extends VectorDistance

    implementation of VectorDistance that computes cosine similarity between two vectors

  2. object Main

  3. def bitSetComparator(a: BitSet, b: BitSet): Int

    Compares two bit sets according to the first different bit

  4. def bitSetIsEqual(vec1: BitSet, vec2: BitSet): Boolean

    Compares two bit sets for their equality

  5. def bitSetToString(bs: BitSet): String

    Returns a string representation of a BitSet

  6. def distinct(matrix: RDD[MatrixEntry]): RDD[MatrixEntry]

    Take distinct matrix entry values based on the indices only.

    Take distinct matrix entry values based on the indices only. The actual values are discarded.

  7. def hamming(vec1: BitSet, vec2: BitSet): Int

    Returns the hamming distance between two bit vectors

  8. def hammingToCosine(hammingDistance: Int, d: Double): Double

    Approximates the cosine distance of two bit sets using their hamming distance

  9. def localRandomMatrix(d: Int, numFeatures: Int): Matrix

    Returns a local k by d matrix with random gaussian entries mean=0.

    Returns a local k by d matrix with random gaussian entries mean=0.0 and std=1.0

    This is a k by d matrix as it is multiplied by the input matrix

  10. def matrixToBitSet(inputMatrix: IndexedRowMatrix, localRandomMatrix: Matrix): RDD[Signature]

    Converts a given input matrix to a bit set representation using random hyperplanes

  11. def matrixToBitSetSparse(inputMatrix: IndexedRowMatrix, localRandomMatrix: Matrix): RDD[SparseSignature]

    Converts a given input matrix to a bit set representation using random hyperplanes

  12. def vectorToBitSet(vector: Vector): BitSet

    Converts a vector to a bit set by replacing all values of x with sign(x)

Inherited from AnyRef

Inherited from Any

Ungrouped