Class TextData


  • public class TextData
    extends java.lang.Object
    TextData is a utility for managing textual data within a Dataset.

    See TextDataset for an example.

    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  TextData.Configuration
      The configuration for creating a TextData value in a Dataset.
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static TextData.Configuration getDefaultConfiguration()
      Returns a good default TextData.Configuration to use for the constructor with defaults.
      ai.djl.ndarray.NDArray getEmbedding​(ai.djl.ndarray.NDManager manager, long index)
      Gets the text embedding for the given index of the text input.
      java.util.List<java.lang.String> getProcessedText​(long index)
      Gets the textual input after preprocessing.
      java.lang.String getRawText​(long index)
      Gets the raw textual input.
      int getSize()
      Returns the size of the data.
      ai.djl.modality.nlp.embedding.TextEmbedding getTextEmbedding()
      Gets the TextEmbedding used to embed the data with.
      ai.djl.modality.nlp.Vocabulary getVocabulary()
      Gets the DefaultVocabulary built while preprocessing the text data.
      void preprocess​(ai.djl.ndarray.NDManager manager, java.util.List<java.lang.String> newTextData)
      Preprocess the textData into NDArray by providing the data from the dataset.
      void setEmbeddingSize​(int embeddingSize)
      Sets the embedding size.
      void setTextEmbedding​(ai.djl.modality.nlp.embedding.TextEmbedding textEmbedding)
      Sets the textEmbedding to embed the data with.
      void setTextProcessors​(java.util.List<ai.djl.modality.nlp.preprocess.TextProcessor> textProcessors)
      Sets the text processors.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • preprocess

        public void preprocess​(ai.djl.ndarray.NDManager manager,
                               java.util.List<java.lang.String> newTextData)
                        throws ai.djl.modality.nlp.embedding.EmbeddingException
        Preprocess the textData into NDArray by providing the data from the dataset.
        Parameters:
        manager - the
        newTextData - the data from the dataset
        Throws:
        ai.djl.modality.nlp.embedding.EmbeddingException - if there is an error while embedding input
      • setTextProcessors

        public void setTextProcessors​(java.util.List<ai.djl.modality.nlp.preprocess.TextProcessor> textProcessors)
        Sets the text processors.
        Parameters:
        textProcessors - the new textProcessors
      • setTextEmbedding

        public void setTextEmbedding​(ai.djl.modality.nlp.embedding.TextEmbedding textEmbedding)
        Sets the textEmbedding to embed the data with.
        Parameters:
        textEmbedding - the textEmbedding
      • getTextEmbedding

        public ai.djl.modality.nlp.embedding.TextEmbedding getTextEmbedding()
        Gets the TextEmbedding used to embed the data with.
        Returns:
        the TextEmbedding
      • setEmbeddingSize

        public void setEmbeddingSize​(int embeddingSize)
        Sets the embedding size.
        Parameters:
        embeddingSize - the embedding size
      • getVocabulary

        public ai.djl.modality.nlp.Vocabulary getVocabulary()
        Gets the DefaultVocabulary built while preprocessing the text data.
        Returns:
        the DefaultVocabulary
      • getEmbedding

        public ai.djl.ndarray.NDArray getEmbedding​(ai.djl.ndarray.NDManager manager,
                                                   long index)
        Gets the text embedding for the given index of the text input.
        Parameters:
        manager - the manager for the embedding array
        index - the index of the text input
        Returns:
        the NDArray containing the text embedding
      • getRawText

        public java.lang.String getRawText​(long index)
        Gets the raw textual input.
        Parameters:
        index - the index of the text input
        Returns:
        the raw text
      • getProcessedText

        public java.util.List<java.lang.String> getProcessedText​(long index)
        Gets the textual input after preprocessing.
        Parameters:
        index - the index of the text input
        Returns:
        the list of processed tokens
      • getSize

        public int getSize()
        Returns the size of the data.
        Returns:
        the size of the data