Interface TextEmbedding

All Known Implementing Classes:
ModelZooTextEmbedding, SimpleTextEmbedding, TrainableTextEmbedding

public interface TextEmbedding
A class to manage 1-D NDArray representations of multiple words.

A text embedding differs from a WordEmbedding because the text embedding does not have to be applied to each word independently.

A text embedding maps text to a NDArray that attempts to represent the key ideas in the words. Each of the values in the dimension can represent different pieces of meaning such as young-old, object-living, etc.

These text embeddings can be used in two different ways in models. First, they can be used purely for preprocessing the model. In this case, it is a requirement for most models that use text as an input. The model is not trained. For this use case, use embedText(ai.djl.ndarray.NDManager, java.util.List<java.lang.String>).

In the second option, the embedding can be trained using the standard deep learning techniques to better handle the current dataset. For this case, you need two methods. First, call preprocessTextToEmbed(List) within your dataset. Then, the first step in your model should be to call embedText(NDManager, long[]).

  • Method Details

    • preprocessTextToEmbed

      long[] preprocessTextToEmbed(List<String> text)
      Preprocesses the text to embed into an array to pass into the model.

      Make sure to call embedText(NDManager, long[]) after this.

      Parameters:
      text - the text to embed
      Returns:
      the indices of text that is ready to embed
    • embedText

      default NDArray embedText(NDManager manager, List<String> text) throws EmbeddingException
      Embeds a text.
      Parameters:
      manager - the manager for the embedding array
      text - the text to embed
      Returns:
      the embedded text
      Throws:
      EmbeddingException - if there is an error while trying to embed
    • embedText

      default NDArray embedText(NDManager manager, long[] textIndices) throws EmbeddingException
      Embeds the text after preprocessed using preprocessTextToEmbed(List).
      Parameters:
      manager - the manager to create the embedding array
      textIndices - the indices of text to embed
      Returns:
      the embedded text
      Throws:
      EmbeddingException - if there is an error while trying to embed
    • embedText

      NDArray embedText(NDArray textIndices) throws EmbeddingException
      Embeds the text after preprocessed using preprocessTextToEmbed(List).
      Parameters:
      textIndices - the indices of text to embed
      Returns:
      the embedded text
      Throws:
      EmbeddingException - if there is an error while trying to embed
    • unembedText

      List<String> unembedText(NDArray textEmbedding) throws EmbeddingException
      Returns the closest matching text for a given embedding.
      Parameters:
      textEmbedding - the text embedding to find the matching string text for.
      Returns:
      text similar to the passed in embedding
      Throws:
      EmbeddingException - if the input is not unembeddable