alternate constructor to allow loading from a stream, possibly with a set of words to constrain the vocab
alternate constructor to allow loading from a source, possibly with a set of words to constrain the vocab
alternate constructor to allow loading from a file, possibly with a set of words to constrain the vocab
Finds the average embedding similarity between any two words in these two texts IMPORTANT: words here must be words not lemmas!
If the word doesn't exist in the lexicon, try to use UNK
Fetches the embeddings vector for a given word (not lemma)
Fetches the embeddings vector for a given word (not lemma)
The word
the array of embeddings weights
for a sequence of (word, weight) pairs, interpolate the vectors corresponding to the words by their respective weights, and normalize the resulting vector
Finds the maximum embedding similarity between any two words in these two texts IMPORTANT: IMPORTANT: t1, t2 must be arrays of words, not lemmas!
Finds the words most similar to this set of inputs IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!
filterPredicate: if passed, only returns words that match the predicate
Similar to sanitizedTextSimilarity, but but using the multiplicative heuristic of Levy and Goldberg (2014) IMPORTANT: words here must already be normalized using sanitizeWord()!
Similar to sanitizedTextSimilarity, but but using the multiplicative heuristic of Levy and Goldberg (2014) IMPORTANT: words here must already be normalized using sanitizeWord()!
Similarity value
Similar to textSimilarity, but using the multiplicative heuristic of Levy and Goldberg (2014) IMPORTANT: t1, t2 must be arrays of words, not lemmas!
Finds the average embedding similarity between any two words in these two texts IMPORTANT: words here must already be normalized using sanitizeWord()! Changelog: (Peter/June 4/2014) Now returns words list of pairwise scores, for optional answer justification.
Finds the maximum embedding similarity between any two words in these two texts IMPORTANT: words here must already be normalized using sanitizeWord()!
Finds the minimum embedding similarity between any two words in these two texts IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!
Computes the cosine similarity between two texts, according to the embedding matrix IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!
Computes the similarity between two given words IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!
Computes the similarity between two given words IMPORTANT: words here must already be normalized using Word2vec.sanitizeWord()!
The first word
The second word
The cosine similarity of the two corresponding vectors
Computes the cosine similarity between two texts, according to the embedding matrix IMPORTANT: t1, t2 must be arrays of words, not lemmas!
Implements similarity metrics using the embedding matrix IMPORTANT: In our implementation, words are lower cased but NOT lemmatized or stemmed (see sanitizeWord) Note: matrixConstructor is lazy, meant to save memory space if we're caching features User: mihais, dfried, gus Date: 11/25/13 Last Modified: Fix compiler issue: import scala.io.Source.