Package ai.djl.training.dataset
Class RandomAccessDataset
java.lang.Object
ai.djl.training.dataset.RandomAccessDataset
- All Implemented Interfaces:
Dataset
- Direct Known Subclasses:
ArrayDataset
RandomAccessDataset represent the dataset that support random access reads. i.e. it could access
a specific data item given the index.
Almost all datasets in DJL extend, either directly or indirectly, RandomAccessDataset
.
-
Nested Class Summary
Nested ClassesNested classes/interfaces inherited from interface ai.djl.training.dataset.Dataset
Dataset.Usage
-
Field Summary
FieldsModifier and TypeFieldDescriptionprotected Batchifier
protected Device
protected Batchifier
protected long
protected Pipeline
protected int
protected Sampler
protected Pipeline
-
Constructor Summary
ConstructorsConstructorDescriptionRandomAccessDataset
(RandomAccessDataset.BaseBuilder<?> builder) Creates a new instance ofRandomAccessDataset
with the given necessary configurations. -
Method Summary
Modifier and TypeMethodDescriptionprotected abstract long
Returns the number of records available to be read in thisDataset
.abstract Record
Gets theRecord
for the given index from the dataset.Fetches an iterator that can iterate through theDataset
.Fetches an iterator that can iterate through theDataset
with a custom sampler.getData
(NDManager manager, Sampler sampler, ExecutorService executorService) Fetches an iterator that can iterate through theDataset
with a custom sampler multi-threaded.getData
(NDManager manager, ExecutorService executorService) Fetches an iterator that can iterate through theDataset
with multiple threads.protected RandomAccessDataset
newSubDataset
(int[] indices, int from, int to) protected RandomAccessDataset
newSubDataset
(List<Long> subIndices) randomSplit
(int... ratio) Splits the dataset set into multiple portions.long
size()
Returns the size of thisDataset
.subDataset
(int fromIndex, int toIndex) Returns a view of the portion of this data between the specifiedfromIndex
, inclusive, andtoIndex
, exclusive.subDataset
(List<Long> subIndices) Returns a view of the portion of this data for the specifiedsubIndices
.subDataset
(List<K> recordKeys, List<K> subRecordKeys) Returns a view of the portion of this data for the specified record keys.subDataset
(Map<K, Long> indicesOfRecordKeys, List<K> subRecordKeys) Returns a view of the portion of this data for the specified record keys.Returns the dataset contents as a Java array.Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
Methods inherited from interface ai.djl.training.dataset.Dataset
matchingTranslatorOptions, prepare, prepare
-
Field Details
-
sampler
-
dataBatchifier
-
labelBatchifier
-
pipeline
-
targetPipeline
-
prefetchNumber
protected int prefetchNumber -
limit
protected long limit -
device
-
-
Constructor Details
-
RandomAccessDataset
Creates a new instance ofRandomAccessDataset
with the given necessary configurations.- Parameters:
builder
- a builder with the necessary configurations
-
-
Method Details
-
get
Gets theRecord
for the given index from the dataset.- Parameters:
manager
- the manager used to create the arraysindex
- the index of the requested data item- Returns:
- a
Record
that contains the data and label of the requested data item - Throws:
IOException
- if an I/O error occurs
-
getData
Fetches an iterator that can iterate through theDataset
.- Specified by:
getData
in interfaceDataset
- Parameters:
manager
- the dataset to iterate through- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
getData
public Iterable<Batch> getData(NDManager manager, ExecutorService executorService) throws IOException, TranslateException Fetches an iterator that can iterate through theDataset
with multiple threads.- Specified by:
getData
in interfaceDataset
- Parameters:
manager
- the dataset to iterate throughexecutorService
- the executorService to use for multi-threading- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
getData
public Iterable<Batch> getData(NDManager manager, Sampler sampler) throws IOException, TranslateException Fetches an iterator that can iterate through theDataset
with a custom sampler.- Parameters:
manager
- the manager to create the arrayssampler
- the sampler to use to iterate through the dataset- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
getData
public Iterable<Batch> getData(NDManager manager, Sampler sampler, ExecutorService executorService) throws IOException, TranslateException Fetches an iterator that can iterate through theDataset
with a custom sampler multi-threaded.- Parameters:
manager
- the manager to create the arrayssampler
- the sampler to use to iterate through the datasetexecutorService
- the executorService to multi-thread with- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
size
public long size()Returns the size of thisDataset
.- Returns:
- the size of this
Dataset
-
availableSize
protected abstract long availableSize()Returns the number of records available to be read in thisDataset
.- Returns:
- the number of records available to be read in this
Dataset
-
randomSplit
Splits the dataset set into multiple portions.- Parameters:
ratio
- the ratio of each sub dataset- Returns:
- an array of the sub dataset
- Throws:
IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
subDataset
Returns a view of the portion of this data between the specifiedfromIndex
, inclusive, andtoIndex
, exclusive.- Parameters:
fromIndex
- low endpoint (inclusive) of the subDatasettoIndex
- high endpoint (exclusive) of the subData- Returns:
- a view of the specified range within this dataset
-
subDataset
Returns a view of the portion of this data for the specifiedsubIndices
.- Parameters:
subIndices
- sub-set of indices of this dataset- Returns:
- a view of the specified indices within this dataset
-
subDataset
Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys inrecordKeys
, thensubRecordKeys
defines the view on the corresponding records of the database.- Type Parameters:
K
- the record key type.- Parameters:
recordKeys
- unique keys for all records of this dataset.subRecordKeys
- keys to define the view on the dataset. All keys insubRecordKeys
must be contained inrecordKeys
but may occur more than once.- Returns:
- a view of the specified records within this dataset
-
subDataset
Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys inindicesOfRecordKeys
, thensubRecordKeys
defines the view on the corresponding records of the database.- Type Parameters:
K
- the record key type.- Parameters:
indicesOfRecordKeys
- Map for keys of the records in this dataset to their index position within this dataset. While this map typically maps all records, technically it just needs to map the ones occurring insubRecordKeys
.subRecordKeys
- Keys to define the view on the dataset. All keys insubRecordKeys
must be contained inindicesOfRecordKeys
but may occur more than once.- Returns:
- a view of the records identified by the specified keys of this dataset
-
newSubDataset
-
newSubDataset
-
toArray
public ai.djl.util.Pair<Number[][],Number[][]> toArray(NDManager manager) throws IOException, TranslateException Returns the dataset contents as a Java array.Each Number[] is a flattened dataset record and the Number[][] is the array of all records.
- Parameters:
manager
- the manager to create the arrays- Returns:
- the dataset contents as a Java array
- Throws:
IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-