Class RandomAccessDataset

  • All Implemented Interfaces:
    Dataset
    Direct Known Subclasses:
    ArrayDataset

    public abstract class RandomAccessDataset
    extends java.lang.Object
    implements Dataset
    RandomAccessDataset represent the dataset that support random access reads. i.e. it could access a specific data item given the index.

    Almost all datasets in DJL extend, either directly or indirectly, RandomAccessDataset.

    See Also:
    The guide to implementing a custom dataset
    • Field Detail

      • sampler

        protected Sampler sampler
      • dataBatchifier

        protected Batchifier dataBatchifier
      • labelBatchifier

        protected Batchifier labelBatchifier
      • targetPipeline

        protected Pipeline targetPipeline
      • prefetchNumber

        protected int prefetchNumber
      • limit

        protected long limit
      • device

        protected Device device
    • Constructor Detail

      • RandomAccessDataset

        public RandomAccessDataset​(RandomAccessDataset.BaseBuilder<?> builder)
        Creates a new instance of RandomAccessDataset with the given necessary configurations.
        Parameters:
        builder - a builder with the necessary configurations
    • Method Detail

      • get

        public abstract Record get​(NDManager manager,
                                   long index)
                            throws java.io.IOException
        Gets the Record for the given index from the dataset.
        Parameters:
        manager - the manager used to create the arrays
        index - the index of the requested data item
        Returns:
        a Record that contains the data and label of the requested data item
        Throws:
        java.io.IOException - if an I/O error occurs
      • getData

        public java.lang.Iterable<Batch> getData​(NDManager manager)
                                          throws java.io.IOException,
                                                 TranslateException
        Fetches an iterator that can iterate through the Dataset.
        Specified by:
        getData in interface Dataset
        Parameters:
        manager - the dataset to iterate through
        Returns:
        an Iterable of Batch that contains batches of data from the dataset
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input
      • getData

        public java.lang.Iterable<Batch> getData​(NDManager manager,
                                                 java.util.concurrent.ExecutorService executorService)
                                          throws java.io.IOException,
                                                 TranslateException
        Fetches an iterator that can iterate through the Dataset with multiple threads.
        Specified by:
        getData in interface Dataset
        Parameters:
        manager - the dataset to iterate through
        executorService - the executorService to use for multi-threading
        Returns:
        an Iterable of Batch that contains batches of data from the dataset
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input
      • getData

        public java.lang.Iterable<Batch> getData​(NDManager manager,
                                                 Sampler sampler)
                                          throws java.io.IOException,
                                                 TranslateException
        Fetches an iterator that can iterate through the Dataset with a custom sampler.
        Parameters:
        manager - the manager to create the arrays
        sampler - the sampler to use to iterate through the dataset
        Returns:
        an Iterable of Batch that contains batches of data from the dataset
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input
      • getData

        public java.lang.Iterable<Batch> getData​(NDManager manager,
                                                 Sampler sampler,
                                                 java.util.concurrent.ExecutorService executorService)
                                          throws java.io.IOException,
                                                 TranslateException
        Fetches an iterator that can iterate through the Dataset with a custom sampler multi-threaded.
        Parameters:
        manager - the manager to create the arrays
        sampler - the sampler to use to iterate through the dataset
        executorService - the executorService to multi-thread with
        Returns:
        an Iterable of Batch that contains batches of data from the dataset
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input
      • size

        public long size()
        Returns the size of this Dataset.
        Returns:
        the size of this Dataset
      • availableSize

        protected abstract long availableSize()
        Returns the number of records available to be read in this Dataset.
        Returns:
        the number of records available to be read in this Dataset
      • randomSplit

        public RandomAccessDataset[] randomSplit​(int... ratio)
                                          throws java.io.IOException,
                                                 TranslateException
        Splits the dataset set into multiple portions.
        Parameters:
        ratio - the ratio of each sub dataset
        Returns:
        an array of the sub dataset
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input
      • subDataset

        public RandomAccessDataset subDataset​(int fromIndex,
                                              int toIndex)
        Returns a view of the portion of this data between the specified fromIndex, inclusive, and toIndex, exclusive.
        Parameters:
        fromIndex - low endpoint (inclusive) of the subDataset
        toIndex - high endpoint (exclusive) of the subData
        Returns:
        a view of the specified range within this dataset
      • subDataset

        public RandomAccessDataset subDataset​(java.util.List<java.lang.Long> subIndices)
        Returns a view of the portion of this data for the specified subIndices.
        Parameters:
        subIndices - sub-set of indices of this dataset
        Returns:
        a view of the specified indices within this dataset
      • subDataset

        public <K> RandomAccessDataset subDataset​(java.util.List<K> recordKeys,
                                                  java.util.List<K> subRecordKeys)
        Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys in recordKeys, then subRecordKeys defines the view on the corresponding records of the database.
        Type Parameters:
        K - the record key type.
        Parameters:
        recordKeys - unique keys for all records of this dataset.
        subRecordKeys - keys to define the view on the dataset. All keys in subRecordKeys must be contained in recordKeys but may occur more than once.
        Returns:
        a view of the specified records within this dataset
      • subDataset

        public <K> RandomAccessDataset subDataset​(java.util.Map<K,​java.lang.Long> indicesOfRecordKeys,
                                                  java.util.List<K> subRecordKeys)
        Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys in indicesOfRecordKeys, then subRecordKeys defines the view on the corresponding records of the database.
        Type Parameters:
        K - the record key type.
        Parameters:
        indicesOfRecordKeys - Map for keys of the records in this dataset to their index position within this dataset. While this map typically maps all records, technically it just needs to map the ones occuring in subRecordKeys.
        subRecordKeys - Keys to define the view on the dataset. All keys in subRecordKeys must be contained in indicesOfRecordKeys but may occur more than once.
        Returns:
        a view of the records identified by the specified keys of this dataset
      • newSubDataset

        protected RandomAccessDataset newSubDataset​(int[] indices,
                                                    int from,
                                                    int to)
      • newSubDataset

        protected RandomAccessDataset newSubDataset​(java.util.List<java.lang.Long> subIndices)
      • toArray

        public ai.djl.util.Pair<java.lang.Number[][],​java.lang.Number[][]> toArray​(NDManager manager)
                                                                                  throws java.io.IOException,
                                                                                         TranslateException
        Returns the dataset contents as a Java array.

        Each Number[] is a flattened dataset record and the Number[][] is the array of all records.

        Parameters:
        manager - the manager to create the arrays
        Returns:
        the dataset contents as a Java array
        Throws:
        java.io.IOException - for various exceptions depending on the dataset
        TranslateException - if there is an error while processing input