SanitizedWordEmbeddingMap

Instance Constructors

new SanitizedWordEmbeddingMap(is: InputStream, wordsToUse: Option[Set[String]], caseInsensitiveWordsToUse: Boolean)

alternate constructor to allow loading from a stream, possibly with a set of words to constrain the vocab
new SanitizedWordEmbeddingMap(src: Source, wordsToUse: Option[Set[String]], caseInsensitiveWordsToUse: Boolean)

alternate constructor to allow loading from a source, possibly with a set of words to constrain the vocab
new SanitizedWordEmbeddingMap(mf: String, wordsToUse: Option[Set[String]] = None, caseInsensitiveWordsToUse: Boolean = false)

alternate constructor to allow loading from a file, possibly with a set of words to constrain the vocab
new SanitizedWordEmbeddingMap(matrixConstructor: ⇒ Map[String, Array[Double]])

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes
Any
def avgSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

Finds the average embedding similarity between any two words in these two texts IMPORTANT: words here must be words not lemmas!
def avgSimilarityReturnTop(t1: Iterable[String], t2: Iterable[String]): (Double, Array[(Double, String, String)])
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
lazy val dimensions: Int
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getEmbedding(w: String): Option[Array[Double]]

If the word doesn't exist in the lexicon, try to use UNK
def getWordVector(word: String): Option[Array[Double]]

Fetches the embeddings vector for a given word (not lemma)
Fetches the embeddings vector for a given word (not lemma)
word
The word
returns
the array of embeddings weights
def hashCode(): Int

Definition Classes
AnyRef → Any
def interpolate(wordsAndWeights: Iterable[(String, Double)]): Array[Double]

for a sequence of (word, weight) pairs, interpolate the vectors corresponding to the words by their respective weights, and normalize the resulting vector
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
def logMultiplicativeSanitizedTextSimilarity(t1: Iterable[String], t2: Iterable[String], method: Symbol = Symbol("linear"), normalize: Boolean = false): Double
def logMultiplicativeTextSimilarity(t1: Iterable[String], t2: Iterable[String], method: Symbol = Symbol("linear"), normalize: Boolean = false): Double
def makeCompositeVector(t: Iterable[String]): Array[Double]
val matrix: Map[String, Array[Double]]
def maxSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

Finds the maximum embedding similarity between any two words in these two texts IMPORTANT: IMPORTANT: t1, t2 must be arrays of words, not lemmas!
def minSimilarity(t1: Iterable[String], t2: Iterable[String]): Double
def mostSimilarWords(word: String, howMany: Int, filterPredicate: Option[(String) ⇒ Boolean] = None): List[(String, Double)]
def mostSimilarWords(words: Set[String], howMany: Int): List[(String, Double)]

Finds the words most similar to this set of inputs IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!
def mostSimilarWords(v: Array[Double], howMany: Int, filterPredicate: Option[(String) ⇒ Boolean]): List[(String, Double)]

filterPredicate: if passed, only returns words that match the predicate
def multiplicativeSanitizedTextSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

Similar to sanitizedTextSimilarity, but but using the multiplicative heuristic of Levy and Goldberg (2014) IMPORTANT: words here must already be normalized using sanitizeWord()!
Similar to sanitizedTextSimilarity, but but using the multiplicative heuristic of Levy and Goldberg (2014) IMPORTANT: words here must already be normalized using sanitizeWord()!
returns
Similarity value
def multiplicativeTextSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

Similar to textSimilarity, but using the multiplicative heuristic of Levy and Goldberg (2014) IMPORTANT: t1, t2 must be arrays of words, not lemmas!
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def sanitizedAvgSimilarity(t1: Iterable[String], t2: Iterable[String]): (Double, ArrayBuffer[(Double, String, String)])

Finds the average embedding similarity between any two words in these two texts IMPORTANT: words here must already be normalized using sanitizeWord()! Changelog: (Peter/June 4/2014) Now returns words list of pairwise scores, for optional answer justification.
def sanitizedMaxSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

Finds the maximum embedding similarity between any two words in these two texts IMPORTANT: words here must already be normalized using sanitizeWord()!
def sanitizedMinSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

Finds the minimum embedding similarity between any two words in these two texts IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!
def sanitizedTextSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

Computes the cosine similarity between two texts, according to the embedding matrix IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!
def saveMatrix(mf: String): Unit
def similarity(w1: String, w2: String): Double

Computes the similarity between two given words IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!
Computes the similarity between two given words IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!
w1
The first word
w2
The second word
returns
The cosine similarity of the two corresponding vectors
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def textSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

Computes the cosine similarity between two texts, according to the embedding matrix IMPORTANT: t1, t2 must be arrays of words, not lemmas!
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Docs: object SanitizedWordEmbeddingMap | package embeddings

class SanitizedWordEmbeddingMap extends AnyRef

Instance Constructors

new SanitizedWordEmbeddingMap(is: InputStream, wordsToUse: Option[Set[String]], caseInsensitiveWordsToUse: Boolean)

new SanitizedWordEmbeddingMap(src: Source, wordsToUse: Option[Set[String]], caseInsensitiveWordsToUse: Boolean)

new SanitizedWordEmbeddingMap(mf: String, wordsToUse: Option[Set[String]] = None, caseInsensitiveWordsToUse: Boolean = false)

new SanitizedWordEmbeddingMap(matrixConstructor: ⇒ Map[String, Array[Double]])

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

def avgSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

def avgSimilarityReturnTop(t1: Iterable[String], t2: Iterable[String]): (Double, Array[(Double, String, String)])

def clone(): AnyRef

lazy val dimensions: Int

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

final def getClass(): Class[_]

def getEmbedding(w: String): Option[Array[Double]]

def getWordVector(word: String): Option[Array[Double]]

def hashCode(): Int

def interpolate(wordsAndWeights: Iterable[(String, Double)]): Array[Double]

final def isInstanceOf[T0]: Boolean

def logMultiplicativeSanitizedTextSimilarity(t1: Iterable[String], t2: Iterable[String], method: Symbol = Symbol("linear"), normalize: Boolean = false): Double

def logMultiplicativeTextSimilarity(t1: Iterable[String], t2: Iterable[String], method: Symbol = Symbol("linear"), normalize: Boolean = false): Double

def makeCompositeVector(t: Iterable[String]): Array[Double]

val matrix: Map[String, Array[Double]]

def maxSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

def minSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

def mostSimilarWords(word: String, howMany: Int, filterPredicate: Option[(String) ⇒ Boolean] = None): List[(String, Double)]

def mostSimilarWords(words: Set[String], howMany: Int): List[(String, Double)]

def mostSimilarWords(v: Array[Double], howMany: Int, filterPredicate: Option[(String) ⇒ Boolean]): List[(String, Double)]

def multiplicativeSanitizedTextSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

def multiplicativeTextSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def sanitizedAvgSimilarity(t1: Iterable[String], t2: Iterable[String]): (Double, ArrayBuffer[(Double, String, String)])

def sanitizedMaxSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

def sanitizedMinSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

def sanitizedTextSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

def saveMatrix(mf: String): Unit

def similarity(w1: String, w2: String): Double

final def synchronized[T0](arg0: ⇒ T0): T0

def textSimilarity(t1: Iterable[String], t2: Iterable[String]): Double

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from AnyRef

Inherited from Any

Ungrouped