Package ai.djl.basicdataset.nlp
Contains a library of built-in datasets for
Application.NLP
.-
Class Summary Class Description AmazonReview TheAmazonReview
dataset contains aApplication.NLP.SENTIMENT_ANALYSIS
set of reviews and their sentiment ratings.AmazonReview.Builder A builder to construct aAmazonReview
.CookingStackExchange A text classification dataset contains questions from cooking.stackexchange.com and their associated tags on the site.CookingStackExchange.Builder A builder to construct aCookingStackExchange
.GoEmotions GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.GoEmotions.Builder A builder to construct aGoEmotions
.PennTreebankText The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation (see here for details).PennTreebankText.Builder A builder to construct aPennTreebankText
.StanfordMovieReview TheStanfordMovieReview
dataset contains aApplication.NLP.SENTIMENT_ANALYSIS
set of movie reviews and their sentiment ratings.StanfordMovieReview.Builder A builder for aStanfordMovieReview
.StanfordQuestionAnsweringDataset Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.StanfordQuestionAnsweringDataset.Builder A builder for aStanfordQuestionAnsweringDataset
.TatoebaEnglishFrenchDataset TatoebaEnglishFrenchDataset
is a English-French machine translation dataset from The Tatoeba Project (http://www.manythings.org/anki/).TatoebaEnglishFrenchDataset.Builder A builder for aTatoebaEnglishFrenchDataset
.TextDataset TextDataset
is an abstract dataset that can be used for datasets for natural language processing where either the source or target are text-based data.TextDataset.Builder<T extends TextDataset.Builder<T>> Abstract Builder that helps build aTextDataset
.TextDataset.Sample A class storesTextDataset
sample information.UniversalDependenciesEnglishEWT A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13.UniversalDependenciesEnglishEWT.Builder A builder for aUniversalDependenciesEnglishEWT
.WikiText2 The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.WikiText2.Builder A builder to construct aWikiText2
.