Interface WordEmbedding

All Known Implementing Classes:
TrainableWordEmbedding

public interface WordEmbedding
A class to manage 1-D NDArray representations of words.

A word embedding maps words to a NDArray that attempts to represent the key ideas in the words. Each of the values in the dimension can represent different pieces of meaning such as young-old, object-living, etc.

These word embeddings can be used in two different ways in models. First, they can be used purely for preprocessing the model. In this case, it is a requirement for most models that use text as an input. The model is not trained. For this use case, use embedWord(NDManager, String).

In the second option, the embedding can be trained using the standard deep learning techniques to better handle the current dataset. For this case, you need two methods. First, call preprocessWordToEmbed(String) within your dataset. Then, the first step in your model should be to call embedWord(NDManager, long).

  • Method Details

    • vocabularyContains

      boolean vocabularyContains(String word)
      Returns whether an embedding exists for a word.
      Parameters:
      word - the word to check
      Returns:
      true if an embedding exists
    • preprocessWordToEmbed

      long preprocessWordToEmbed(String word)
      Pre-processes the word to embed into an array to pass into the model.

      Make sure to call embedWord(NDManager, long) after this.

      Parameters:
      word - the word to embed
      Returns:
      the word that is ready to embed
    • embedWord

      default NDArray embedWord(NDManager manager, String word) throws EmbeddingException
      Embeds a word.
      Parameters:
      manager - the manager for the embedding array
      word - the word to embed
      Returns:
      the embedded word
      Throws:
      EmbeddingException - if there is an error while trying to embed
    • embedWord

      default NDArray embedWord(NDManager manager, long index) throws EmbeddingException
      Embeds the word after preprocessed using preprocessWordToEmbed(String).
      Parameters:
      manager - the manager for the embedding array
      index - the index of the word to embed
      Returns:
      the embedded word
      Throws:
      EmbeddingException - if there is an error while trying to embed
    • embedWord

      NDArray embedWord(NDArray index) throws EmbeddingException
      Embeds the word after preprocessed using preprocessWordToEmbed(String).
      Parameters:
      index - the index of the word to embed
      Returns:
      the embedded word
      Throws:
      EmbeddingException - if there is an error while trying to embed
    • unembedWord

      String unembedWord(NDArray word)
      Returns the closest matching word for the given index.
      Parameters:
      word - the word embedding to find the matching string word for.
      Returns:
      a word similar to the passed in embedding