Package ai.djl.basicdataset.nlp
Class GoEmotions
- java.lang.Object
-
- ai.djl.training.dataset.RandomAccessDataset
-
- ai.djl.basicdataset.nlp.TextDataset
-
- ai.djl.basicdataset.nlp.GoEmotions
-
- All Implemented Interfaces:
ai.djl.training.dataset.Dataset
public class GoEmotions extends TextDataset
GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral. This version of data is filtered based on rater-agreement on top of the raw data, and contains a train/test/validation split. The emotion categories are: admiration, amusement, anger, annoyance, approval, caring, confusion, curiosity, desire, disappointment, disapproval, disgust, embarrassment, excitement, fear, gratitude, grief, joy, love, nervousness, optimism, pride, realization, relief, remorse, sadness, surprise.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
GoEmotions.Builder
A builder to construct aGoEmotions
.-
Nested classes/interfaces inherited from class ai.djl.basicdataset.nlp.TextDataset
TextDataset.Sample
-
-
Field Summary
-
Fields inherited from class ai.djl.basicdataset.nlp.TextDataset
manager, mrl, prepared, samples, sourceTextData, targetTextData, usage
-
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected long
availableSize()
Returns the number of records available to be read in thisDataset
.static GoEmotions.Builder
builder()
Creates a builder to build aGoEmotions
.ai.djl.training.dataset.Record
get(ai.djl.ndarray.NDManager manager, long index)
Gets theRecord
for the given index from the dataset.void
prepare(ai.djl.util.Progress progress)
Prepares the dataset for use with tracked progress.-
Methods inherited from class ai.djl.basicdataset.nlp.TextDataset
getProcessedText, getRawText, getSamples, getTextEmbedding, getVocabulary, preprocess
-
Methods inherited from class ai.djl.training.dataset.RandomAccessDataset
getData, getData, getData, getData, newSubDataset, newSubDataset, randomSplit, size, subDataset, subDataset, subDataset, subDataset, toArray
-
-
-
-
Method Detail
-
prepare
public void prepare(ai.djl.util.Progress progress) throws java.io.IOException, ai.djl.modality.nlp.embedding.EmbeddingException
Prepares the dataset for use with tracked progress. In this method the TSV file will be parsed. All datasets will be preprocessed.- Parameters:
progress
- the progress tracker- Throws:
java.io.IOException
- for various exceptions depending on the datasetai.djl.modality.nlp.embedding.EmbeddingException
-
get
public ai.djl.training.dataset.Record get(ai.djl.ndarray.NDManager manager, long index) throws java.io.IOException
Gets theRecord
for the given index from the dataset.- Specified by:
get
in classai.djl.training.dataset.RandomAccessDataset
- Parameters:
manager
- the manager used to create the arraysindex
- the index of the requested data item- Returns:
- a
Record
that contains the data and label of the requested data item. The dataNDList
contains threeNDArray
s representing the embedded title, context and question, which are named accordingly. The labelNDList
contains multipleNDArray
s corresponding to each embedded answer. - Throws:
java.io.IOException
-
availableSize
protected long availableSize()
Returns the number of records available to be read in thisDataset
. In this implementation, the actual size of available records are the size ofquestionInfoList
.- Specified by:
availableSize
in classai.djl.training.dataset.RandomAccessDataset
- Returns:
- the number of records available to be read in this
Dataset
-
builder
public static GoEmotions.Builder builder()
Creates a builder to build aGoEmotions
.- Returns:
- a new builder
-
-