Package ai.djl.basicdataset.utils
Class TextData
- java.lang.Object
-
- ai.djl.basicdataset.utils.TextData
-
public class TextData extends java.lang.Object
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
TextData.Configuration
The configuration for creating aTextData
value in aDataset
.
-
Constructor Summary
Constructors Constructor Description TextData(TextData.Configuration config)
Constructs a newTextData
.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static TextData.Configuration
getDefaultConfiguration()
Returns a good defaultTextData.Configuration
to use for the constructor with defaults.ai.djl.ndarray.NDArray
getEmbedding(ai.djl.ndarray.NDManager manager, long index)
Gets the text embedding for the given index of the text input.java.util.List<java.lang.String>
getProcessedText(long index)
Gets the textual input after preprocessing.java.lang.String
getRawText(long index)
Gets the raw textual input.int
getSize()
Returns the size of the data.ai.djl.modality.nlp.embedding.TextEmbedding
getTextEmbedding()
Gets theTextEmbedding
used to embed the data with.ai.djl.modality.nlp.Vocabulary
getVocabulary()
Gets theDefaultVocabulary
built while preprocessing the text data.void
preprocess(ai.djl.ndarray.NDManager manager, java.util.List<java.lang.String> newTextData)
Preprocess the textData intoNDArray
by providing the data from the dataset.void
setEmbeddingSize(int embeddingSize)
Sets the embedding size.void
setTextEmbedding(ai.djl.modality.nlp.embedding.TextEmbedding textEmbedding)
Sets the textEmbedding to embed the data with.void
setTextProcessors(java.util.List<ai.djl.modality.nlp.preprocess.TextProcessor> textProcessors)
Sets the text processors.
-
-
-
Constructor Detail
-
TextData
public TextData(TextData.Configuration config)
Constructs a newTextData
.- Parameters:
config
- the configuration for theTextData
-
-
Method Detail
-
getDefaultConfiguration
public static TextData.Configuration getDefaultConfiguration()
Returns a good defaultTextData.Configuration
to use for the constructor with defaults.- Returns:
- a good default
TextData.Configuration
to use for the constructor with defaults
-
preprocess
public void preprocess(ai.djl.ndarray.NDManager manager, java.util.List<java.lang.String> newTextData) throws ai.djl.modality.nlp.embedding.EmbeddingException
Preprocess the textData intoNDArray
by providing the data from the dataset.- Parameters:
manager
- thenewTextData
- the data from the dataset- Throws:
ai.djl.modality.nlp.embedding.EmbeddingException
- if there is an error while embedding input
-
setTextProcessors
public void setTextProcessors(java.util.List<ai.djl.modality.nlp.preprocess.TextProcessor> textProcessors)
Sets the text processors.- Parameters:
textProcessors
- the new textProcessors
-
setTextEmbedding
public void setTextEmbedding(ai.djl.modality.nlp.embedding.TextEmbedding textEmbedding)
Sets the textEmbedding to embed the data with.- Parameters:
textEmbedding
- the textEmbedding
-
getTextEmbedding
public ai.djl.modality.nlp.embedding.TextEmbedding getTextEmbedding()
Gets theTextEmbedding
used to embed the data with.- Returns:
- the
TextEmbedding
-
setEmbeddingSize
public void setEmbeddingSize(int embeddingSize)
Sets the embedding size.- Parameters:
embeddingSize
- the embedding size
-
getVocabulary
public ai.djl.modality.nlp.Vocabulary getVocabulary()
Gets theDefaultVocabulary
built while preprocessing the text data.- Returns:
- the
DefaultVocabulary
-
getEmbedding
public ai.djl.ndarray.NDArray getEmbedding(ai.djl.ndarray.NDManager manager, long index)
Gets the text embedding for the given index of the text input.- Parameters:
manager
- the manager for the embedding arrayindex
- the index of the text input- Returns:
- the
NDArray
containing the text embedding
-
getRawText
public java.lang.String getRawText(long index)
Gets the raw textual input.- Parameters:
index
- the index of the text input- Returns:
- the raw text
-
getProcessedText
public java.util.List<java.lang.String> getProcessedText(long index)
Gets the textual input after preprocessing.- Parameters:
index
- the index of the text input- Returns:
- the list of processed tokens
-
getSize
public int getSize()
Returns the size of the data.- Returns:
- the size of the data
-
-