Class RandomAccessDataset

java.lang.Object
ai.djl.training.dataset.RandomAccessDataset
All Implemented Interfaces:
Dataset
Direct Known Subclasses:
ArrayDataset

public abstract class RandomAccessDataset extends Object implements Dataset
RandomAccessDataset represent the dataset that support random access reads. i.e. it could access a specific data item given the index.

Almost all datasets in DJL extend, either directly or indirectly, RandomAccessDataset.

See Also:
  • Field Details

    • sampler

      protected Sampler sampler
    • dataBatchifier

      protected Batchifier dataBatchifier
    • labelBatchifier

      protected Batchifier labelBatchifier
    • pipeline

      protected Pipeline pipeline
    • targetPipeline

      protected Pipeline targetPipeline
    • prefetchNumber

      protected int prefetchNumber
    • limit

      protected long limit
    • device

      protected Device device
  • Constructor Details

  • Method Details

    • get

      public abstract Record get(NDManager manager, long index) throws IOException
      Gets the Record for the given index from the dataset.
      Parameters:
      manager - the manager used to create the arrays
      index - the index of the requested data item
      Returns:
      a Record that contains the data and label of the requested data item
      Throws:
      IOException - if an I/O error occurs
    • getData

      public Iterable<Batch> getData(NDManager manager) throws IOException, TranslateException
      Fetches an iterator that can iterate through the Dataset.
      Specified by:
      getData in interface Dataset
      Parameters:
      manager - the dataset to iterate through
      Returns:
      an Iterable of Batch that contains batches of data from the dataset
      Throws:
      IOException - for various exceptions depending on the dataset
      TranslateException - if there is an error while processing input
    • getData

      public Iterable<Batch> getData(NDManager manager, ExecutorService executorService) throws IOException, TranslateException
      Fetches an iterator that can iterate through the Dataset with multiple threads.
      Specified by:
      getData in interface Dataset
      Parameters:
      manager - the dataset to iterate through
      executorService - the executorService to use for multi-threading
      Returns:
      an Iterable of Batch that contains batches of data from the dataset
      Throws:
      IOException - for various exceptions depending on the dataset
      TranslateException - if there is an error while processing input
    • getData

      public Iterable<Batch> getData(NDManager manager, Sampler sampler) throws IOException, TranslateException
      Fetches an iterator that can iterate through the Dataset with a custom sampler.
      Parameters:
      manager - the manager to create the arrays
      sampler - the sampler to use to iterate through the dataset
      Returns:
      an Iterable of Batch that contains batches of data from the dataset
      Throws:
      IOException - for various exceptions depending on the dataset
      TranslateException - if there is an error while processing input
    • getData

      public Iterable<Batch> getData(NDManager manager, Sampler sampler, ExecutorService executorService) throws IOException, TranslateException
      Fetches an iterator that can iterate through the Dataset with a custom sampler multi-threaded.
      Parameters:
      manager - the manager to create the arrays
      sampler - the sampler to use to iterate through the dataset
      executorService - the executorService to multi-thread with
      Returns:
      an Iterable of Batch that contains batches of data from the dataset
      Throws:
      IOException - for various exceptions depending on the dataset
      TranslateException - if there is an error while processing input
    • size

      public long size()
      Returns the size of this Dataset.
      Returns:
      the size of this Dataset
    • availableSize

      protected abstract long availableSize()
      Returns the number of records available to be read in this Dataset.
      Returns:
      the number of records available to be read in this Dataset
    • randomSplit

      public RandomAccessDataset[] randomSplit(int... ratio) throws IOException, TranslateException
      Splits the dataset set into multiple portions.
      Parameters:
      ratio - the ratio of each sub dataset
      Returns:
      an array of the sub dataset
      Throws:
      IOException - for various exceptions depending on the dataset
      TranslateException - if there is an error while processing input
    • subDataset

      public RandomAccessDataset subDataset(int fromIndex, int toIndex)
      Returns a view of the portion of this data between the specified fromIndex, inclusive, and toIndex, exclusive.
      Parameters:
      fromIndex - low endpoint (inclusive) of the subDataset
      toIndex - high endpoint (exclusive) of the subData
      Returns:
      a view of the specified range within this dataset
    • subDataset

      public RandomAccessDataset subDataset(List<Long> subIndices)
      Returns a view of the portion of this data for the specified subIndices.
      Parameters:
      subIndices - sub-set of indices of this dataset
      Returns:
      a view of the specified indices within this dataset
    • subDataset

      public <K> RandomAccessDataset subDataset(List<K> recordKeys, List<K> subRecordKeys)
      Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys in recordKeys, then subRecordKeys defines the view on the corresponding records of the database.
      Type Parameters:
      K - the record key type.
      Parameters:
      recordKeys - unique keys for all records of this dataset.
      subRecordKeys - keys to define the view on the dataset. All keys in subRecordKeys must be contained in recordKeys but may occur more than once.
      Returns:
      a view of the specified records within this dataset
    • subDataset

      public <K> RandomAccessDataset subDataset(Map<K,Long> indicesOfRecordKeys, List<K> subRecordKeys)
      Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys in indicesOfRecordKeys, then subRecordKeys defines the view on the corresponding records of the database.
      Type Parameters:
      K - the record key type.
      Parameters:
      indicesOfRecordKeys - Map for keys of the records in this dataset to their index position within this dataset. While this map typically maps all records, technically it just needs to map the ones occurring in subRecordKeys.
      subRecordKeys - Keys to define the view on the dataset. All keys in subRecordKeys must be contained in indicesOfRecordKeys but may occur more than once.
      Returns:
      a view of the records identified by the specified keys of this dataset
    • newSubDataset

      protected RandomAccessDataset newSubDataset(int[] indices, int from, int to)
    • newSubDataset

      protected RandomAccessDataset newSubDataset(List<Long> subIndices)
    • toArray

      public ai.djl.util.Pair<Number[][],Number[][]> toArray(NDManager manager) throws IOException, TranslateException
      Returns the dataset contents as a Java array.

      Each Number[] is a flattened dataset record and the Number[][] is the array of all records.

      Parameters:
      manager - the manager to create the arrays
      Returns:
      the dataset contents as a Java array
      Throws:
      IOException - for various exceptions depending on the dataset
      TranslateException - if there is an error while processing input