Class/Object

org.clulab.embeddings

SanitizedWordEmbeddingMap

Related Docs: object SanitizedWordEmbeddingMap | package embeddings

Permalink

class SanitizedWordEmbeddingMap extends AnyRef

Implements similarity metrics using the embedding matrix IMPORTANT: In our implementation, words are lower cased but NOT lemmatized or stemmed (see sanitizeWord) Note: matrixConstructor is lazy, meant to save memory space if we're caching features User: mihais, dfried, gus Date: 11/25/13 Last Modified: Fix compiler issue: import scala.io.Source.

Annotations
@deprecated
Deprecated

(Since version processors 8.3.0) ExplicitWordEmbeddingMap should replace the functionality in this class

Linear Supertypes
AnyRef, Any
Known Subclasses
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SanitizedWordEmbeddingMap
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SanitizedWordEmbeddingMap(is: InputStream, wordsToUse: Option[Set[String]], caseInsensitiveWordsToUse: Boolean)

    Permalink

    alternate constructor to allow loading from a stream, possibly with a set of words to constrain the vocab

  2. new SanitizedWordEmbeddingMap(src: Source, wordsToUse: Option[Set[String]], caseInsensitiveWordsToUse: Boolean)

    Permalink

    alternate constructor to allow loading from a source, possibly with a set of words to constrain the vocab

  3. new SanitizedWordEmbeddingMap(mf: String, wordsToUse: Option[Set[String]] = None, caseInsensitiveWordsToUse: Boolean = false)

    Permalink

    alternate constructor to allow loading from a file, possibly with a set of words to constrain the vocab

  4. new SanitizedWordEmbeddingMap(matrixConstructor: ⇒ Map[String, Array[Double]])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def avgSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

    Permalink

    Finds the average embedding similarity between any two words in these two texts IMPORTANT: words here must be words not lemmas!

  6. def avgSimilarityReturnTop(t1: Iterable[String], t2: Iterable[String]): (Double, Array[(Double, String, String)])

    Permalink
  7. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  8. lazy val dimensions: Int

    Permalink
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def getEmbedding(w: String): Option[Array[Double]]

    Permalink

    If the word doesn't exist in the lexicon, try to use UNK

  14. def getWordVector(word: String): Option[Array[Double]]

    Permalink

    Fetches the embeddings vector for a given word (not lemma)

    Fetches the embeddings vector for a given word (not lemma)

    word

    The word

    returns

    the array of embeddings weights

  15. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  16. def interpolate(wordsAndWeights: Iterable[(String, Double)]): Array[Double]

    Permalink

    for a sequence of (word, weight) pairs, interpolate the vectors corresponding to the words by their respective weights, and normalize the resulting vector

  17. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  18. def logMultiplicativeSanitizedTextSimilarity(t1: Iterable[String], t2: Iterable[String], method: Symbol = Symbol("linear"), normalize: Boolean = false): Double

    Permalink
  19. def logMultiplicativeTextSimilarity(t1: Iterable[String], t2: Iterable[String], method: Symbol = Symbol("linear"), normalize: Boolean = false): Double

    Permalink
  20. def makeCompositeVector(t: Iterable[String]): Array[Double]

    Permalink
  21. val matrix: Map[String, Array[Double]]

    Permalink
  22. def maxSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

    Permalink

    Finds the maximum embedding similarity between any two words in these two texts IMPORTANT: IMPORTANT: t1, t2 must be arrays of words, not lemmas!

  23. def minSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

    Permalink
  24. def mostSimilarWords(word: String, howMany: Int, filterPredicate: Option[(String) ⇒ Boolean] = None): List[(String, Double)]

    Permalink
  25. def mostSimilarWords(words: Set[String], howMany: Int): List[(String, Double)]

    Permalink

    Finds the words most similar to this set of inputs IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!

  26. def mostSimilarWords(v: Array[Double], howMany: Int, filterPredicate: Option[(String) ⇒ Boolean]): List[(String, Double)]

    Permalink

    filterPredicate: if passed, only returns words that match the predicate

  27. def multiplicativeSanitizedTextSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

    Permalink

    Similar to sanitizedTextSimilarity, but but using the multiplicative heuristic of Levy and Goldberg (2014) IMPORTANT: words here must already be normalized using sanitizeWord()!

    Similar to sanitizedTextSimilarity, but but using the multiplicative heuristic of Levy and Goldberg (2014) IMPORTANT: words here must already be normalized using sanitizeWord()!

    returns

    Similarity value

  28. def multiplicativeTextSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

    Permalink

    Similar to textSimilarity, but using the multiplicative heuristic of Levy and Goldberg (2014) IMPORTANT: t1, t2 must be arrays of words, not lemmas!

  29. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  30. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  31. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  32. def sanitizedAvgSimilarity(t1: Iterable[String], t2: Iterable[String]): (Double, ArrayBuffer[(Double, String, String)])

    Permalink

    Finds the average embedding similarity between any two words in these two texts IMPORTANT: words here must already be normalized using sanitizeWord()! Changelog: (Peter/June 4/2014) Now returns words list of pairwise scores, for optional answer justification.

  33. def sanitizedMaxSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

    Permalink

    Finds the maximum embedding similarity between any two words in these two texts IMPORTANT: words here must already be normalized using sanitizeWord()!

  34. def sanitizedMinSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

    Permalink

    Finds the minimum embedding similarity between any two words in these two texts IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!

  35. def sanitizedTextSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

    Permalink

    Computes the cosine similarity between two texts, according to the embedding matrix IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!

  36. def saveMatrix(mf: String): Unit

    Permalink
  37. def similarity(w1: String, w2: String): Double

    Permalink

    Computes the similarity between two given words IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!

    Computes the similarity between two given words IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!

    w1

    The first word

    w2

    The second word

    returns

    The cosine similarity of the two corresponding vectors

  38. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  39. def textSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

    Permalink

    Computes the cosine similarity between two texts, according to the embedding matrix IMPORTANT: t1, t2 must be arrays of words, not lemmas!

  40. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  41. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  42. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  43. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped