Package ai.djl.basicdataset.nlp
package ai.djl.basicdataset.nlp
Contains a library of built-in datasets for
Application.NLP
.-
ClassDescriptionThe
AmazonReview
dataset contains aApplication.NLP.SENTIMENT_ANALYSIS
set of reviews and their sentiment ratings.A builder to construct aAmazonReview
.A text classification dataset contains questions from cooking.stackexchange.com and their associated tags on the site.A builder to construct aCookingStackExchange
.GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.A builder to construct aGoEmotions
.The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation (see here for details).A builder to construct aPennTreebankText
.TheStanfordMovieReview
dataset contains aApplication.NLP.SENTIMENT_ANALYSIS
set of movie reviews and their sentiment ratings.A builder for aStanfordMovieReview
.Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.A builder for aStanfordQuestionAnsweringDataset
.TatoebaEnglishFrenchDataset
is a English-French machine translation dataset from The Tatoeba Project (http://www.manythings.org/anki/).A builder for aTatoebaEnglishFrenchDataset
.TextDataset
is an abstract dataset that can be used for datasets for natural language processing where either the source or target are text-based data.TextDataset.Builder<T extends TextDataset.Builder<T>>Abstract Builder that helps build aTextDataset
.A class storesTextDataset
sample information.A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13.A builder for aUniversalDependenciesEnglishEWT
.The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.A builder to construct aWikiText2
.