Interface WordEmbedding

  • All Known Implementing Classes:
    TrainableWordEmbedding

    public interface WordEmbedding
    A class to manage 1-D NDArray representations of words.

    A word embedding maps words to a NDArray that attempts to represent the key ideas in the words. Each of the values in the dimension can represent different pieces of meaning such as young-old, object-living, etc.

    These word embeddings can be used in two different ways in models. First, they can be used purely for preprocessing the model. In this case, it is a requirement for most models that use text as an input. The model is not trained. For this use case, use embedWord(NDManager, String).

    In the second option, the embedding can be trained using the standard deep learning techniques to better handle the current dataset. For this case, you need two methods. First, call preprocessWordToEmbed(String) within your dataset. Then, the first step in your model should be to call embedWord(NDManager, long).

    • Method Detail

      • vocabularyContains

        boolean vocabularyContains​(java.lang.String word)
        Returns whether an embedding exists for a word.
        Parameters:
        word - the word to check
        Returns:
        true if an embedding exists
      • preprocessWordToEmbed

        long preprocessWordToEmbed​(java.lang.String word)
        Pre-processes the word to embed into an array to pass into the model.

        Make sure to call embedWord(NDManager, long) after this.

        Parameters:
        word - the word to embed
        Returns:
        the word that is ready to embed
      • embedWord

        default NDArray embedWord​(NDManager manager,
                                  java.lang.String word)
                           throws EmbeddingException
        Embeds a word.
        Parameters:
        manager - the manager for the embedding array
        word - the word to embed
        Returns:
        the embedded word
        Throws:
        EmbeddingException - if there is an error while trying to embed
      • unembedWord

        java.lang.String unembedWord​(NDArray word)
        Returns the closest matching word for the given index.
        Parameters:
        word - the word embedding to find the matching string word for.
        Returns:
        a word similar to the passed in embedding