Package ai.djl.basicdataset.nlp
Class WikiText2
- java.lang.Object
-
- ai.djl.basicdataset.nlp.WikiText2
-
- All Implemented Interfaces:
ai.djl.training.dataset.Dataset
,ai.djl.training.dataset.RawDataset<java.nio.file.Path>
public class WikiText2 extends java.lang.Object implements ai.djl.training.dataset.RawDataset<java.nio.file.Path>
The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
WikiText2.Builder
A builder to construct aWikiText2
.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static WikiText2.Builder
builder()
Creates a builder to build aWikiText2
.java.nio.file.Path
getData()
Get data from the WikiText2 dataset.java.lang.Iterable<ai.djl.training.dataset.Batch>
getData(ai.djl.ndarray.NDManager manager)
Fetches an iterator that can iterate through theDataset
.void
prepare(ai.djl.util.Progress progress)
Prepares the dataset for use with tracked progress.
-
-
-
Method Detail
-
builder
public static WikiText2.Builder builder()
Creates a builder to build aWikiText2
.- Returns:
- a new
WikiText2.Builder
object
-
prepare
public void prepare(ai.djl.util.Progress progress) throws java.io.IOException
Prepares the dataset for use with tracked progress.- Specified by:
prepare
in interfaceai.djl.training.dataset.Dataset
- Parameters:
progress
- the progress tracker- Throws:
java.io.IOException
- for various exceptions depending on the dataset
-
getData
public java.lang.Iterable<ai.djl.training.dataset.Batch> getData(ai.djl.ndarray.NDManager manager) throws java.io.IOException, ai.djl.translate.TranslateException
Fetches an iterator that can iterate through theDataset
. This method is not implemented for the WikiText2 dataset because the WikiText2 dataset is not suitable for iteration. If the method is called, it will directly returnnull
.- Specified by:
getData
in interfaceai.djl.training.dataset.Dataset
- Parameters:
manager
- the dataset to iterate through- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
java.io.IOException
ai.djl.translate.TranslateException
-
getData
public java.nio.file.Path getData() throws java.io.IOException
Get data from the WikiText2 dataset. This method will directly return the whole dataset.- Specified by:
getData
in interfaceai.djl.training.dataset.RawDataset<java.nio.file.Path>
- Returns:
- a
Path
object locating the WikiText2 dataset file - Throws:
java.io.IOException
-
-