Class RandomAccessDataset

  • All Implemented Interfaces:
    Dataset
    Direct Known Subclasses:
    ArrayDataset

    public abstract class RandomAccessDataset
    extends java.lang.Object
    implements Dataset
    RandomAccessDataset represent the dataset that support random access reads. i.e. it could access a specific data item given the index.

    Almost all datasets in DJL extend, either directly or indirectly, RandomAccessDataset.

    See Also:
    The guide to implementing a custom dataset
    • Field Detail

      • sampler

        protected Sampler sampler
      • dataBatchifier

        protected Batchifier dataBatchifier
      • labelBatchifier

        protected Batchifier labelBatchifier
      • targetPipeline

        protected Pipeline targetPipeline
      • prefetchNumber

        protected int prefetchNumber
      • limit

        protected long limit
      • device

        protected Device device
    • Constructor Detail

      • RandomAccessDataset

        public RandomAccessDataset​(RandomAccessDataset.BaseBuilder<?> builder)
        Creates a new instance of RandomAccessDataset with the given necessary configurations.
        Parameters:
        builder - a builder with the necessary configurations
    • Method Detail

      • get

        public abstract Record get​(NDManager manager,
                                   long index)
                            throws java.io.IOException
        Gets the Record for the given index from the dataset.
        Parameters:
        manager - the manager used to create the arrays
        index - the index of the requested data item
        Returns:
        a Record that contains the data and label of the requested data item
        Throws:
        java.io.IOException - if an I/O error occurs
      • getData

        public java.lang.Iterable<Batch> getData​(NDManager manager)
                                          throws java.io.IOException,
                                                 TranslateException
        Fetches an iterator that can iterate through the Dataset.
        Specified by:
        getData in interface Dataset
        Parameters:
        manager - the dataset to iterate through
        Returns:
        an Iterable of Batch that contains batches of data from the dataset
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input
      • getData

        public java.lang.Iterable<Batch> getData​(NDManager manager,
                                                 java.util.concurrent.ExecutorService executorService)
                                          throws java.io.IOException,
                                                 TranslateException
        Fetches an iterator that can iterate through the Dataset with multiple threads.
        Specified by:
        getData in interface Dataset
        Parameters:
        manager - the dataset to iterate through
        executorService - the executorService to use for multi-threading
        Returns:
        an Iterable of Batch that contains batches of data from the dataset
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input
      • getData

        public java.lang.Iterable<Batch> getData​(NDManager manager,
                                                 Sampler sampler)
                                          throws java.io.IOException,
                                                 TranslateException
        Fetches an iterator that can iterate through the Dataset with a custom sampler.
        Parameters:
        manager - the dataset to iterate through
        sampler - the sampler to use to iterate through the dataset
        Returns:
        an Iterable of Batch that contains batches of data from the dataset
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input
      • getData

        public java.lang.Iterable<Batch> getData​(NDManager manager,
                                                 Sampler sampler,
                                                 java.util.concurrent.ExecutorService executorService)
                                          throws java.io.IOException,
                                                 TranslateException
        Fetches an iterator that can iterate through the Dataset with a custom sampler multi-threaded.
        Parameters:
        manager - the dataset to iterate through
        sampler - the sampler to use to iterate through the dataset
        executorService - the executorService to multi-thread with
        Returns:
        an Iterable of Batch that contains batches of data from the dataset
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input
      • size

        public long size()
        Returns the size of this Dataset.
        Returns:
        the size of this Dataset
      • availableSize

        protected abstract long availableSize()
        Returns the number of records available to be read in this Dataset.
        Returns:
        the number of records available to be read in this Dataset
      • randomSplit

        public RandomAccessDataset[] randomSplit​(int... ratio)
                                          throws java.io.IOException,
                                                 TranslateException
        Splits the dataset set into multiple portions.
        Parameters:
        ratio - the ratio of each sub dataset
        Returns:
        an array of the sub dataset
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input
      • subDataset

        public RandomAccessDataset subDataset​(int fromIndex,
                                              int toIndex)
        Returns a view of the portion of this data between the specified fromIndex, inclusive, and toIndex, exclusive.
        Parameters:
        fromIndex - low endpoint (inclusive) of the subDataset
        toIndex - high endpoint (exclusive) of the subData
        Returns:
        a view of the specified range within this dataset
      • toArray

        public ai.djl.util.Pair<java.lang.Number[][],​java.lang.Number[][]> toArray()
                                                                                  throws java.io.IOException,
                                                                                         TranslateException
        Returns the dataset contents as a Java array.

        Each Number[] is a flattened dataset record and the Number[][] is the array of all records.

        Returns:
        the dataset contents as a Java array
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input