Package ai.djl.training.dataset
Class RandomAccessDataset
- java.lang.Object
-
- ai.djl.training.dataset.RandomAccessDataset
-
- All Implemented Interfaces:
Dataset
- Direct Known Subclasses:
ArrayDataset
public abstract class RandomAccessDataset extends java.lang.Object implements Dataset
RandomAccessDataset represent the dataset that support random access reads. i.e. it could access a specific data item given the index.Almost all datasets in DJL extend, either directly or indirectly,
RandomAccessDataset
.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
RandomAccessDataset.BaseBuilder<T extends RandomAccessDataset.BaseBuilder<T>>
The Builder to construct aRandomAccessDataset
.-
Nested classes/interfaces inherited from interface ai.djl.training.dataset.Dataset
Dataset.Usage
-
-
Field Summary
Fields Modifier and Type Field Description protected Batchifier
dataBatchifier
protected Device
device
protected Batchifier
labelBatchifier
protected long
limit
protected Pipeline
pipeline
protected int
prefetchNumber
protected Sampler
sampler
protected Pipeline
targetPipeline
-
Constructor Summary
Constructors Constructor Description RandomAccessDataset(RandomAccessDataset.BaseBuilder<?> builder)
Creates a new instance ofRandomAccessDataset
with the given necessary configurations.
-
Method Summary
All Methods Instance Methods Abstract Methods Concrete Methods Modifier and Type Method Description protected abstract long
availableSize()
Returns the number of records available to be read in thisDataset
.abstract Record
get(NDManager manager, long index)
Gets theRecord
for the given index from the dataset.java.lang.Iterable<Batch>
getData(NDManager manager)
Fetches an iterator that can iterate through theDataset
.java.lang.Iterable<Batch>
getData(NDManager manager, Sampler sampler)
Fetches an iterator that can iterate through theDataset
with a custom sampler.java.lang.Iterable<Batch>
getData(NDManager manager, Sampler sampler, java.util.concurrent.ExecutorService executorService)
Fetches an iterator that can iterate through theDataset
with a custom sampler multi-threaded.java.lang.Iterable<Batch>
getData(NDManager manager, java.util.concurrent.ExecutorService executorService)
Fetches an iterator that can iterate through theDataset
with multiple threads.protected RandomAccessDataset
newSubDataset(int[] indices, int from, int to)
protected RandomAccessDataset
newSubDataset(java.util.List<java.lang.Long> subIndices)
RandomAccessDataset[]
randomSplit(int... ratio)
Splits the dataset set into multiple portions.long
size()
Returns the size of thisDataset
.RandomAccessDataset
subDataset(int fromIndex, int toIndex)
Returns a view of the portion of this data between the specifiedfromIndex
, inclusive, andtoIndex
, exclusive.RandomAccessDataset
subDataset(java.util.List<java.lang.Long> subIndices)
Returns a view of the portion of this data for the specifiedsubIndices
.<K> RandomAccessDataset
subDataset(java.util.List<K> recordKeys, java.util.List<K> subRecordKeys)
Returns a view of the portion of this data for the specified record keys.<K> RandomAccessDataset
subDataset(java.util.Map<K,java.lang.Long> indicesOfRecordKeys, java.util.List<K> subRecordKeys)
Returns a view of the portion of this data for the specified record keys.ai.djl.util.Pair<java.lang.Number[][],java.lang.Number[][]>
toArray(NDManager manager)
Returns the dataset contents as a Java array.-
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
-
Methods inherited from interface ai.djl.training.dataset.Dataset
matchingTranslatorOptions, prepare, prepare
-
-
-
-
Field Detail
-
sampler
protected Sampler sampler
-
dataBatchifier
protected Batchifier dataBatchifier
-
labelBatchifier
protected Batchifier labelBatchifier
-
pipeline
protected Pipeline pipeline
-
targetPipeline
protected Pipeline targetPipeline
-
prefetchNumber
protected int prefetchNumber
-
limit
protected long limit
-
device
protected Device device
-
-
Constructor Detail
-
RandomAccessDataset
public RandomAccessDataset(RandomAccessDataset.BaseBuilder<?> builder)
Creates a new instance ofRandomAccessDataset
with the given necessary configurations.- Parameters:
builder
- a builder with the necessary configurations
-
-
Method Detail
-
get
public abstract Record get(NDManager manager, long index) throws java.io.IOException
Gets theRecord
for the given index from the dataset.- Parameters:
manager
- the manager used to create the arraysindex
- the index of the requested data item- Returns:
- a
Record
that contains the data and label of the requested data item - Throws:
java.io.IOException
- if an I/O error occurs
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDataset
.- Specified by:
getData
in interfaceDataset
- Parameters:
manager
- the dataset to iterate through- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager, java.util.concurrent.ExecutorService executorService) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDataset
with multiple threads.- Specified by:
getData
in interfaceDataset
- Parameters:
manager
- the dataset to iterate throughexecutorService
- the executorService to use for multi-threading- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager, Sampler sampler) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDataset
with a custom sampler.- Parameters:
manager
- the manager to create the arrayssampler
- the sampler to use to iterate through the dataset- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
getData
public java.lang.Iterable<Batch> getData(NDManager manager, Sampler sampler, java.util.concurrent.ExecutorService executorService) throws java.io.IOException, TranslateException
Fetches an iterator that can iterate through theDataset
with a custom sampler multi-threaded.- Parameters:
manager
- the manager to create the arrayssampler
- the sampler to use to iterate through the datasetexecutorService
- the executorService to multi-thread with- Returns:
- an
Iterable
ofBatch
that contains batches of data from the dataset - Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
size
public long size()
Returns the size of thisDataset
.- Returns:
- the size of this
Dataset
-
availableSize
protected abstract long availableSize()
Returns the number of records available to be read in thisDataset
.- Returns:
- the number of records available to be read in this
Dataset
-
randomSplit
public RandomAccessDataset[] randomSplit(int... ratio) throws java.io.IOException, TranslateException
Splits the dataset set into multiple portions.- Parameters:
ratio
- the ratio of each sub dataset- Returns:
- an array of the sub dataset
- Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
subDataset
public RandomAccessDataset subDataset(int fromIndex, int toIndex)
Returns a view of the portion of this data between the specifiedfromIndex
, inclusive, andtoIndex
, exclusive.- Parameters:
fromIndex
- low endpoint (inclusive) of the subDatasettoIndex
- high endpoint (exclusive) of the subData- Returns:
- a view of the specified range within this dataset
-
subDataset
public RandomAccessDataset subDataset(java.util.List<java.lang.Long> subIndices)
Returns a view of the portion of this data for the specifiedsubIndices
.- Parameters:
subIndices
- sub-set of indices of this dataset- Returns:
- a view of the specified indices within this dataset
-
subDataset
public <K> RandomAccessDataset subDataset(java.util.List<K> recordKeys, java.util.List<K> subRecordKeys)
Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys inrecordKeys
, thensubRecordKeys
defines the view on the corresponding records of the database.- Type Parameters:
K
- the record key type.- Parameters:
recordKeys
- unique keys for all records of this dataset.subRecordKeys
- keys to define the view on the dataset. All keys insubRecordKeys
must be contained inrecordKeys
but may occur more than once.- Returns:
- a view of the specified records within this dataset
-
subDataset
public <K> RandomAccessDataset subDataset(java.util.Map<K,java.lang.Long> indicesOfRecordKeys, java.util.List<K> subRecordKeys)
Returns a view of the portion of this data for the specified record keys. Assuming that the records of this database are represented by the keys inindicesOfRecordKeys
, thensubRecordKeys
defines the view on the corresponding records of the database.- Type Parameters:
K
- the record key type.- Parameters:
indicesOfRecordKeys
- Map for keys of the records in this dataset to their index position within this dataset. While this map typically maps all records, technically it just needs to map the ones occuring insubRecordKeys
.subRecordKeys
- Keys to define the view on the dataset. All keys insubRecordKeys
must be contained inindicesOfRecordKeys
but may occur more than once.- Returns:
- a view of the records identified by the specified keys of this dataset
-
newSubDataset
protected RandomAccessDataset newSubDataset(int[] indices, int from, int to)
-
newSubDataset
protected RandomAccessDataset newSubDataset(java.util.List<java.lang.Long> subIndices)
-
toArray
public ai.djl.util.Pair<java.lang.Number[][],java.lang.Number[][]> toArray(NDManager manager) throws java.io.IOException, TranslateException
Returns the dataset contents as a Java array.Each Number[] is a flattened dataset record and the Number[][] is the array of all records.
- Parameters:
manager
- the manager to create the arrays- Returns:
- the dataset contents as a Java array
- Throws:
java.io.IOException
- for various exceptions depending on the datasetTranslateException
- if there is an error while processing input
-
-