Class/Object

org.clulab.embeddings

CompactWordEmbeddingMap

Related Docs: object CompactWordEmbeddingMap | package embeddings

Permalink

class CompactWordEmbeddingMap extends WordEmbeddingMap

This class and its companion object have been backported from Eidos. There it is/was an optional replacement for WordEmbeddingMap used for performance reasons. It loads data faster from disk and stores it more compactly in memory. It does not, however, include all the operations of processer's Word2Vec. For instance, logMultiplicativeTextSimilarity is not included, but could probably be added. Other methods like getWordVector, which in Word2Vec returns an Array[Double], would be inefficient to include because the arrays of doubles (or floats) are no longer part of the design. For more documentation other than that immediately below, both the companion object and the related test case (org.clulab.embeddings.TestCompactWord2Vec) may be helpful.

The class is typically instantiated by the apply method of the companion object which takes as arguments a filename and then two booleans: "resource", which specifies whether the named file exists as a resource or is alternatively stored on the broader filesystem, and "cached", which specifies that the data consists of Java-serialized objects (see the save method) or, alternatively, the standard vector text format. The apply method arranges for the file to be read in the appropriate way and converted into a map with the words being keys with values being the row numbers in an implied 2-dimentional matrix of the all vector values, also included in the constructor. So, rather than each word being mapped to an independent, mini array as in Word2Vec, they are mapped to an integer row number of a single, larger matrix/array.

To take advantage of the faster load times, the vector data file needs to be converted from text format into a binary (Java serialized objects) for loadBin below. The test case includes an example. In some preprocessing phase, call CompactWord2Vec(filename, resource = false, cached = false) on the file containing the vectors in text format, such as glove.840B.300d.txt. "resource" is usually false because it can be a very large file, too large to include as a resource. On the resulting return value, call save(compactFilename). Thereafter, for normal, speedy processing, use CompactWord2Vec(compactFilename, resource = false, cached = true).

Linear Supertypes
WordEmbeddingMap, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. CompactWordEmbeddingMap
  2. WordEmbeddingMap
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new CompactWordEmbeddingMap(buildType: BuildType)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def add(dest: Array[Float], srcRow: Int): Unit

    Permalink
    Attributes
    protected
  5. def addWeighted(dest: Array[Float], srcRow: Int, weight: Float): Unit

    Permalink
    Attributes
    protected
  6. val array: Array[Float]

    Permalink
    Attributes
    protected
  7. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  8. def avgSimilarity(texts1: Iterable[String], texts2: Iterable[String]): Float

    Permalink
  9. val buildType: BuildType

    Permalink
    Attributes
    protected
  10. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  11. val columns: Int

    Permalink
  12. def compare(left: Option[IndexedSeq[Float]], right: Option[IndexedSeq[Float]]): Boolean

    Permalink
  13. def compare(left: ImplMapType, right: ImplMapType): Boolean

    Permalink
  14. def compare(lefts: IndexedSeq[Float], rights: IndexedSeq[Float]): Boolean

    Permalink
  15. val dim: Int

    Permalink

    The dimension of an embedding vector

    The dimension of an embedding vector

    Definition Classes
    CompactWordEmbeddingMapWordEmbeddingMap
  16. def dotProduct(row1: Int, row2: Int): Float

    Permalink
  17. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  18. def equals(other: Any): Boolean

    Permalink
    Definition Classes
    CompactWordEmbeddingMap → AnyRef → Any
  19. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  20. def get(word: String): Option[IndexedSeq[Float]]

    Permalink

    Retrieves the embedding for this word, if it exists in the map

    Retrieves the embedding for this word, if it exists in the map

    Definition Classes
    CompactWordEmbeddingMapWordEmbeddingMap
  21. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  22. def getOrElseUnknown(word: String): IndexedSeq[Float]

    Permalink

    Retrieves the embedding for this word; if it doesn't exist in the map uses the Unknown token instead

    Retrieves the embedding for this word; if it doesn't exist in the map uses the Unknown token instead

    Definition Classes
    CompactWordEmbeddingMapWordEmbeddingMap
  23. def hashCode(): Int

    Permalink
    Definition Classes
    CompactWordEmbeddingMap → AnyRef → Any
  24. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  25. def isOutOfVocabulary(word: String): Boolean

    Permalink
  26. def keys: Set[String]

    Permalink

    Returns all keys presented in the map, excluding the key for the unknown token

    Returns all keys presented in the map, excluding the key for the unknown token

    Definition Classes
    CompactWordEmbeddingMapWordEmbeddingMap
  27. def knownKeys: Iterable[String]

    Permalink
  28. def makeCompositeVector(text: Iterable[String]): Array[Float]

    Permalink

    Computes the embedding of a text, as an unweighted average of all words

    Computes the embedding of a text, as an unweighted average of all words

    Definition Classes
    CompactWordEmbeddingMapWordEmbeddingMap
  29. def makeCompositeVectorWeighted(text: Iterable[String], weights: Iterable[Float]): Array[Float]

    Permalink
  30. val map: ImplMapType

    Permalink
    Attributes
    protected
  31. def mkTextFromMap(): String

    Permalink
    Attributes
    protected
  32. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  33. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  34. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  35. val rows: Int

    Permalink
  36. def save(filename: String): Unit

    Permalink

    Save this object in binary format.

    Save this object in binary format.

    Definition Classes
    CompactWordEmbeddingMapWordEmbeddingMap
  37. def saveKryo(filename: String): Unit

    Permalink
  38. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  39. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  40. val unkEmbeddingOpt: Option[IndexedSeq[Float]]

    Permalink
  41. def unknownEmbedding: IndexedSeq[Float]

    Permalink

    The embedding corresponding to the unknown token

    The embedding corresponding to the unknown token

    Definition Classes
    CompactWordEmbeddingMapWordEmbeddingMap
  42. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  43. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  44. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from WordEmbeddingMap

Inherited from AnyRef

Inherited from Any

Ungrouped