ai.djl.basicdataset.nlp (Deep Java Library 0.26.0

package ai.djl.basicdataset.nlp

Contains a library of built-in datasets for Application.NLP.

Related Packages

Package

Description

ai.djl.basicdataset

Contains a library of built-in datasets.

ai.djl.basicdataset.cv

Contains a library of built-in datasets for Application.CV.

ai.djl.basicdataset.tabular

Contains a library of built-in datasets for Application.Tabular.

ai.djl.basicdataset.utils

Contains utilities used within the basic datasets.
Classes

Class

Description

AmazonReview

The AmazonReview dataset contains a Application.NLP.SENTIMENT_ANALYSIS set of reviews and their sentiment ratings.

AmazonReview.Builder

A builder to construct a AmazonReview.

CookingStackExchange

A text classification dataset contains questions from cooking.stackexchange.com and their associated tags on the site.

CookingStackExchange.Builder

A builder to construct a CookingStackExchange.

GoEmotions

GoEmotions is a corpus of 58k carefully curated comments extracted from Reddit, with human annotations to 27 emotion categories or Neutral.

GoEmotions.Builder

A builder to construct a GoEmotions.

PennTreebankText

The Penn Treebank (PTB) project selected 2,499 stories from a three year Wall Street Journal (WSJ) collection of 98,732 stories for syntactic annotation (see here for details).

PennTreebankText.Builder

A builder to construct a PennTreebankText .

StanfordMovieReview

The StanfordMovieReview dataset contains a Application.NLP.SENTIMENT_ANALYSIS set of movie reviews and their sentiment ratings.

StanfordMovieReview.Builder

A builder for a StanfordMovieReview.

StanfordQuestionAnsweringDataset

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

StanfordQuestionAnsweringDataset.Builder

A builder for a StanfordQuestionAnsweringDataset.

TatoebaEnglishFrenchDataset

TatoebaEnglishFrenchDataset is a English-French machine translation dataset from The Tatoeba Project (http://www.manythings.org/anki/).

TatoebaEnglishFrenchDataset.Builder

A builder for a TatoebaEnglishFrenchDataset.

TextDataset

TextDataset is an abstract dataset that can be used for datasets for natural language processing where either the source or target are text-based data.

TextDataset.Builder<T extends TextDataset.Builder<T>>

Abstract Builder that helps build a TextDataset.

TextDataset.Sample

A class stores TextDataset sample information.

UniversalDependenciesEnglishEWT

A Gold Standard Universal Dependencies Corpus for English, built over the source material of the English Web Treebank LDC2012T13.

UniversalDependenciesEnglishEWT.Builder

A builder for a UniversalDependenciesEnglishEWT.

WikiText2

The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.

WikiText2.Builder

A builder to construct a WikiText2 .

Package ai.djl.basicdataset.nlp