Package ai.djl.training.dataset
Class RandomAccessDataset
- java.lang.Object
-
- ai.djl.training.dataset.RandomAccessDataset
-
- All Implemented Interfaces:
Dataset
- Direct Known Subclasses:
ArrayDataset
public abstract class RandomAccessDataset extends java.lang.Object implements Dataset
RandomAccessDataset represent the dataset that support random access reads. i.e. it could access a specific data item given the index.Almost all datasets in DJL extend, either directly or indirectly,
RandomAccessDataset
.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
RandomAccessDataset.BaseBuilder<T extends RandomAccessDataset.BaseBuilder<T>>
The Builder to construct aRandomAccessDataset
.-
Nested classes/interfaces inherited from interface ai.djl.training.dataset.Dataset
Dataset.Usage
-
-
Field Summary
Fields Modifier and Type Field Description protected Batchifier
dataBatchifier
protected Device
device
protected Batchifier
labelBatchifier
protected long
limit
protected Pipeline
pipeline
protected int
prefetchNumber
protected Sampler
sampler
protected Pipeline
targetPipeline
-
Constructor Summary
Constructors Constructor Description RandomAccessDataset(RandomAccessDataset.BaseBuilder<?> builder)
Creates a new instance ofRandomAccessDataset
with the given necessary configurations.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract long
availableSize()
Returns the number of records available to be read in thisDataset
.abstract Record
get(NDManager manager, long index)
Gets theRecord
for the given index from the dataset.java.lang.Iterable<Batch>
getData(NDManager manager)
Fetches an iterator that can iterate through theDataset
.java.lang.Iterable<Batch>
getData(NDManager manager, Sampler sampler)
Fetches an iterator that can iterate through theDataset
with a custom sampler.java.lang.Iterable<Batch>
getData(NDManager manager, Sampler sampler, java.util.concurrent.ExecutorService executorService)
Fetches an iterator that can iterate through theDataset
with a custom sampler multi-threaded.java.lang.Iterable<Batch>
getData(NDManager manager, java.util.concurrent.ExecutorService executorService)
Fetches an iterator that can iterate through theDataset
with multiple threads.RandomAccessDataset[]
randomSplit(int... ratio)
Splits the dataset set into multiple portions.long
size()
Returns the size of thisDataset
.RandomAccessDataset
subDataset(int fromIndex, int toIndex)
Returns a view of the portion of this data between the specifiedfromIndex
, inclusive, andtoIndex
, exclusive.ai.djl.util.Pair<java.lang.Number[][],java.lang.Number[][]>
toArray()
Returns the dataset contents as a Java array.
-
-
-
Field Detail
-
sampler
protected Sampler sampler
-
dataBatchifier
protected Batchifier dataBatchifier
-
labelBatchifier
protected Batchifier labelBatchifier
-
pipeline
protected Pipeline pipeline
-
targetPipeline
protected Pipeline targetPipeline
-
prefetchNumber
protected int prefetchNumber
-
limit
protected long limit
-
device
protected Device device
-
-
Constructor Detail
-
RandomAccessDataset
public RandomAccessDataset(RandomAccessDataset.BaseBuilder<?> builder)
Creates a new instance ofRandomAccessDataset
with the given necessary configurations.- Parameters:
builder
- a builder with the necessary configurations
-
-
Method Detail
-
get
public abstract Record get(NDManager manager, long index) throws java.io.IOException
Gets theRecord
for the given index from the dataset.- Parameters:
manager
- the manager used to create the arraysindex
- the index of the requested data item- Returns:
- a
Record
that contains the data and label of the requested data item - Throws:
java.io.IOException
- if an I/O error occurs
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDataset
.- Specified by:
getData
in interfaceDataset
- Parameters:
manager
- the dataset to iterate through- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager, java.util.concurrent.ExecutorService executorService) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDataset
with multiple threads.- Specified by:
getData
in interfaceDataset
- Parameters:
manager
- the dataset to iterate throughexecutorService
- the executorService to use for multi-threading- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager, Sampler sampler) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDataset
with a custom sampler.- Parameters:
manager
- the dataset to iterate throughsampler
- the sampler to use to iterate through the dataset- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager, Sampler sampler, java.util.concurrent.ExecutorService executorService) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDataset
with a custom sampler multi-threaded.- Parameters:
manager
- the dataset to iterate throughsampler
- the sampler to use to iterate through the datasetexecutorService
- the executorService to multi-thread with- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
size
public long size()
Returns the size of thisDataset
.- Returns:
- the size of this
Dataset
-
availableSize
protected abstract long availableSize()
Returns the number of records available to be read in thisDataset
.- Returns:
- the number of records available to be read in this
Dataset
-
randomSplit
public RandomAccessDataset[] randomSplit(int... ratio) throws java.io.IOException, TranslateException
Splits the dataset set into multiple portions.- Parameters:
ratio
- the ratio of each sub dataset- Returns:
- an array of the sub dataset
- Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
subDataset
public RandomAccessDataset subDataset(int fromIndex, int toIndex)
Returns a view of the portion of this data between the specifiedfromIndex
, inclusive, andtoIndex
, exclusive.- Parameters:
fromIndex
- low endpoint (inclusive) of the subDatasettoIndex
- high endpoint (exclusive) of the subData- Returns:
- a view of the specified range within this dataset
-
toArray
public ai.djl.util.Pair<java.lang.Number[][],java.lang.Number[][]> toArray() throws java.io.IOException, TranslateException
Returns the dataset contents as a Java array.Each Number[] is a flattened dataset record and the Number[][] is the array of all records.
- Returns:
- the dataset contents as a Java array
- Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
-