Class WikiText2

  • All Implemented Interfaces:
    ai.djl.training.dataset.Dataset, ai.djl.training.dataset.RawDataset<java.nio.file.Path>

    public class WikiText2
    extends java.lang.Object
    implements ai.djl.training.dataset.RawDataset<java.nio.file.Path>
    The WikiText language modeling dataset is a collection of over 100 million tokens extracted from the set of verified Good and Featured articles on Wikipedia.
    • Nested Class Summary

      Nested Classes 
      Modifier and Type Class Description
      static class  WikiText2.Builder
      A builder to construct a WikiText2 .
      • Nested classes/interfaces inherited from interface ai.djl.training.dataset.Dataset

        ai.djl.training.dataset.Dataset.Usage
    • Method Summary

      All Methods Static Methods Instance Methods Concrete Methods 
      Modifier and Type Method Description
      static WikiText2.Builder builder()
      Creates a builder to build a WikiText2.
      java.nio.file.Path getData()
      Get data from the WikiText2 dataset.
      java.lang.Iterable<ai.djl.training.dataset.Batch> getData​(ai.djl.ndarray.NDManager manager)
      Fetches an iterator that can iterate through the Dataset.
      void prepare​(ai.djl.util.Progress progress)
      Prepares the dataset for use with tracked progress.
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
      • Methods inherited from interface ai.djl.training.dataset.Dataset

        getData, matchingTranslatorOptions, prepare
    • Method Detail

      • prepare

        public void prepare​(ai.djl.util.Progress progress)
                     throws java.io.IOException
        Prepares the dataset for use with tracked progress.
        Specified by:
        prepare in interface ai.djl.training.dataset.Dataset
        Parameters:
        progress - the progress tracker
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
      • getData

        public java.lang.Iterable<ai.djl.training.dataset.Batch> getData​(ai.djl.ndarray.NDManager manager)
                                                                  throws java.io.IOException,
                                                                         ai.djl.translate.TranslateException
        Fetches an iterator that can iterate through the Dataset. This method is not implemented for the WikiText2 dataset because the WikiText2 dataset is not suitable for iteration. If the method is called, it will directly return null.
        Specified by:
        getData in interface ai.djl.training.dataset.Dataset
        Parameters:
        manager - the dataset to iterate through
        Returns:
        an Iterable of Batch that contains batches of data from the dataset
        Throws:
        java.io.IOException
        ai.djl.translate.TranslateException
      • getData

        public java.nio.file.Path getData()
                                   throws java.io.IOException
        Get data from the WikiText2 dataset. This method will directly return the whole dataset.
        Specified by:
        getData in interface ai.djl.training.dataset.RawDataset<java.nio.file.Path>
        Returns:
        a Path object locating the WikiText2 dataset file
        Throws:
        java.io.IOException