public class DataSet extends Object implements DataSet
Constructor and Description |
---|
DataSet() |
DataSet(INDArray first,
INDArray second)
Creates a dataset with the specified input matrix and labels
|
DataSet(INDArray features,
INDArray labels,
INDArray featuresMask,
INDArray labelsMask)
Create a dataset with the specified input INDArray and labels (output) INDArray, plus (optionally) mask arrays
for the features and labels
|
Modifier and Type | Method and Description |
---|---|
void |
addFeatureVector(INDArray toAdd)
Adds a feature for each example on to the current feature vector
|
void |
addFeatureVector(INDArray feature,
int example)
The feature to add, and the example/row number
|
void |
addRow(DataSet d,
int i) |
List<DataSet> |
asList()
Extract each example in the DataSet into its own DataSet object, and return all of them as a list
|
List<DataSet> |
batchBy(int num)
Partitions a dataset in to mini batches where
each dataset in each list is of the specified number of examples
|
List<DataSet> |
batchByNumLabels() |
void |
binarize()
Same as calling binarize(0)
|
void |
binarize(double cutoff)
Binarizes the dataset such that any number greater than cutoff is 1 otherwise zero
|
DataSet |
copy()
Clone the dataset
|
List<DataSet> |
dataSetBatches(int num)
Partitions the data transform by the specified number.
|
void |
detach()
This method detaches this DataSet from current Workspace (if any)
|
void |
divideBy(int num)
Divide the features by a scalar
|
static DataSet |
empty()
Returns a single dataset (all fields are null)
|
boolean |
equals(Object o) |
INDArray |
exampleMaxs() |
INDArray |
exampleMeans() |
INDArray |
exampleSums() |
void |
filterAndStrip(int[] labels)
Strips the dataset down to the specified labels
and remaps them
|
DataSet |
filterBy(int[] labels)
Strips the data transform of all but the passed in labels
|
DataSet |
get(int i)
Gets a copy of example i
|
DataSet |
get(int[] i)
Gets a copy of example i
|
List<String> |
getColumnNames()
Deprecated.
|
List<Serializable> |
getExampleMetaData()
Get the example metadata, or null if no metadata has been set
|
<T extends Serializable> |
getExampleMetaData(Class<T> metaDataType)
Get the example metadata, or null if no metadata has been set
Note: this method results in an unchecked cast - care should be taken when using this! |
INDArray |
getFeatures()
Returns the features array for the DataSet
|
INDArray |
getFeaturesMaskArray()
Input mask array: a mask array for input, where each value is in {0,1} in order to specify whether an input is
actually present or not.
|
String |
getLabelName(int idx) |
List<String> |
getLabelNames()
Deprecated.
|
List<String> |
getLabelNames(INDArray idxs) |
List<String> |
getLabelNamesList()
Gets the optional label names
|
INDArray |
getLabels()
Returns the labels for the dataset
|
INDArray |
getLabelsMaskArray()
Labels (output) mask array: a mask array for input, where each value is in {0,1} in order to specify whether an
output is actually present or not.
|
long |
getMemoryFootprint()
This method returns memory used by this DataSet
|
DataSet |
getRange(int from,
int to) |
int |
hashCode() |
boolean |
hasMaskArrays()
Whether the labels or input (features) mask arrays are present for this DataSet
|
String |
id() |
boolean |
isEmpty() |
boolean |
isPreProcessed() |
DataSetIterator |
iterateWithMiniBatches() |
Iterator<DataSet> |
iterator() |
Map<Integer,Double> |
labelCounts()
Calculate and return a count of each label, by index.
|
void |
load(File from)
Load the contents of the DataSet from the specified File.
|
void |
load(InputStream from)
Load the contents of the DataSet from the specified InputStream.
|
void |
markAsPreProcessed() |
static DataSet |
merge(List<? extends DataSet> data)
Merge the list of datasets in to one list.
|
void |
migrate()
This method migrates this DataSet into current Workspace (if any)
|
void |
multiplyBy(double num)
Multiply the features by a scalar
|
void |
normalize()
Normalize this DataSet to mean 0, stdev 1 per input.
|
void |
normalizeZeroMeanZeroUnitVariance()
Deprecated.
|
int |
numExamples()
Number of examples in the DataSet
|
int |
numInputs()
The number of inputs in the feature matrix
|
int |
numOutcomes()
Returns the number of outcomes (size of the labels array for each example)
|
int |
outcome() |
DataSet |
reshape(int rows,
int cols)
Reshapes the input in to the given rows and columns
|
void |
roundToTheNearest(int roundTo) |
DataSet |
sample(int numSamples)
Sample without replacement and a random rng
|
DataSet |
sample(int numSamples,
boolean withReplacement)
Sample a dataset numSamples times
|
DataSet |
sample(int numSamples,
Random rng)
Sample without replacement
|
DataSet |
sample(int numSamples,
Random rng,
boolean withReplacement)
Sample a dataset
|
void |
save(File to)
Save this DataSet to a file.
|
void |
save(OutputStream to)
Write the contents of this DataSet to the specified OutputStream
|
void |
scale()
Divides the input data transform
by the max number in each row
|
void |
scaleMinAndMax(double min,
double max) |
void |
setColumnNames(List<String> columnNames)
Deprecated.
|
void |
setExampleMetaData(List<? extends Serializable> exampleMetaData)
Set the metadata for this DataSet
By convention: the metadata can be any serializable object, one per example in the DataSet |
void |
setFeatures(INDArray features)
Set the features array for the DataSet
|
void |
setFeaturesMaskArray(INDArray featuresMask)
Set the features mask array in this DataSet
|
void |
setLabelNames(List<String> labelNames)
Sets the label names, will throw an exception if the passed
in label names doesn't equal the number of outcomes
|
void |
setLabels(INDArray labels) |
void |
setLabelsMaskArray(INDArray labelsMask)
Set the labels mask array in this data set
|
void |
setNewNumberOfLabels(int labels)
Clears the outcome matrix setting a new number of labels
|
void |
setOutcome(int example,
int label)
Sets the outcome of a particular example
|
void |
shuffle()
Shuffle the order of the rows in the DataSet.
|
void |
shuffle(long seed)
Shuffles the dataset in place, given a seed for a random number generator.
|
List<DataSet> |
sortAndBatchByNumLabels()
Sorts the dataset by label:
Splits the data transform such that examples are sorted by their labels.
|
void |
sortByLabel()
Organizes the dataset to minimize sampling error
while still allowing efficient batching.
|
SplitTestAndTrain |
splitTestAndTrain(double fractionTrain)
SplitV the DataSet into two DataSets randomly
|
SplitTestAndTrain |
splitTestAndTrain(int numHoldout)
Splits a dataset in to test and train
|
SplitTestAndTrain |
splitTestAndTrain(int numHoldout,
Random rng)
Splits a dataset in to test and train randomly.
|
void |
squishToRange(double min,
double max)
Squeezes input data to a max and a min
|
MultiDataSet |
toMultiDataSet() |
String |
toString() |
void |
validate() |
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
forEach, spliterator
public DataSet()
public DataSet(INDArray first, INDArray second)
first
- the feature matrixsecond
- the labels (these should be binarized label matrices such that the specified label
has a value of 1 in the desired column with the label)public DataSet(INDArray features, INDArray labels, INDArray featuresMask, INDArray labelsMask)
features
- Features (input)labels
- Labels (output)featuresMask
- Mask array for features, may be nulllabelsMask
- Mask array for labels, may be nullpublic List<Serializable> getExampleMetaData()
DataSet
getExampleMetaData
in interface DataSet
#getExampleMetaData(Class)} for convenience method for types
public <T extends Serializable> List<T> getExampleMetaData(Class<T> metaDataType)
DataSet
getExampleMetaData
in interface DataSet
T
- Type of metadatametaDataType
- Class of the metadata (used for opType information)public void setExampleMetaData(List<? extends Serializable> exampleMetaData)
DataSet
setExampleMetaData
in interface DataSet
exampleMetaData
- Example metadata to setpublic boolean isPreProcessed()
public void markAsPreProcessed()
public static DataSet empty()
public static DataSet merge(List<? extends DataSet> data)
data
- the data to mergepublic void load(InputStream from)
DataSet
DataSet.save(OutputStream)
public void load(File from)
DataSet
DataSet.save(File)
public void save(OutputStream to)
DataSet
public void save(File to)
DataSet
public DataSetIterator iterateWithMiniBatches()
iterateWithMiniBatches
in interface DataSet
public INDArray getFeatures()
DataSet
getFeatures
in interface DataSet
public void setFeatures(INDArray features)
DataSet
setFeatures
in interface DataSet
features
- Features to setpublic Map<Integer,Double> labelCounts()
DataSet
labelCounts
in interface DataSet
public DataSet copy()
public DataSet reshape(int rows, int cols)
public void multiplyBy(double num)
DataSet
multiplyBy
in interface DataSet
public void divideBy(int num)
DataSet
public void shuffle()
DataSet
public void shuffle(long seed)
seed
- Seed to use for the random Number Generatorpublic void squishToRange(double min, double max)
squishToRange
in interface DataSet
min
- the min value to occur in the datasetmax
- the max value to ccur in the datasetpublic void scaleMinAndMax(double min, double max)
scaleMinAndMax
in interface DataSet
public void scale()
public void addFeatureVector(INDArray toAdd)
addFeatureVector
in interface DataSet
toAdd
- the feature vector to addpublic void addFeatureVector(INDArray feature, int example)
addFeatureVector
in interface DataSet
feature
- the feature vector to addexample
- the number of the example to append topublic void normalize()
DataSet
NormalizerStandardize
public void binarize()
public void binarize(double cutoff)
@Deprecated public void normalizeZeroMeanZeroUnitVariance()
normalizeZeroMeanZeroUnitVariance
in interface DataSet
public int numInputs()
public void setNewNumberOfLabels(int labels)
setNewNumberOfLabels
in interface DataSet
labels
- the number of labels/columns in the outcome matrix
Note that this clears the labels for each examplepublic void setOutcome(int example, int label)
setOutcome
in interface DataSet
example
- the example to transformlabel
- the label of the outcomepublic DataSet get(int i)
public DataSet get(int[] i)
public List<DataSet> batchBy(int num)
public DataSet filterBy(int[] labels)
public void filterAndStrip(int[] labels)
filterAndStrip
in interface DataSet
labels
- the labels to strip down topublic List<DataSet> dataSetBatches(int num)
dataSetBatches
in interface DataSet
num
- the number to split bypublic List<DataSet> sortAndBatchByNumLabels()
sortAndBatchByNumLabels
in interface DataSet
public List<DataSet> batchByNumLabels()
batchByNumLabels
in interface DataSet
public List<DataSet> asList()
DataSet
public SplitTestAndTrain splitTestAndTrain(int numHoldout, Random rng)
splitTestAndTrain
in interface DataSet
numHoldout
- the number to hold out for trainingrng
- Random Number Generator to use to shuffle the datasetpublic SplitTestAndTrain splitTestAndTrain(int numHoldout)
splitTestAndTrain
in interface DataSet
numHoldout
- the number to hold out for trainingpublic INDArray getLabels()
public String getLabelName(int idx)
getLabelName
in interface DataSet
idx
- the index to pullRows the string label value out of the list if it existspublic List<String> getLabelNames(INDArray idxs)
getLabelNames
in interface DataSet
idxs
- list of index to pullRows the string label value out of the list if it existspublic void sortByLabel()
sortByLabel
in interface DataSet
public INDArray exampleSums()
exampleSums
in interface DataSet
public INDArray exampleMaxs()
exampleMaxs
in interface DataSet
public INDArray exampleMeans()
exampleMeans
in interface DataSet
public DataSet sample(int numSamples)
public DataSet sample(int numSamples, boolean withReplacement)
public void roundToTheNearest(int roundTo)
roundToTheNearest
in interface DataSet
public int numOutcomes()
DataSet
numOutcomes
in interface DataSet
public int numExamples()
DataSet
numExamples
in interface DataSet
@Deprecated public List<String> getLabelNames()
getLabelNames
in interface DataSet
public List<String> getLabelNamesList()
getLabelNamesList
in interface DataSet
public void setLabelNames(List<String> labelNames)
setLabelNames
in interface DataSet
labelNames
- the label names to use@Deprecated public List<String> getColumnNames()
getColumnNames
in interface DataSet
@Deprecated public void setColumnNames(List<String> columnNames)
setColumnNames
in interface DataSet
columnNames
- public SplitTestAndTrain splitTestAndTrain(double fractionTrain)
DataSet
splitTestAndTrain
in interface DataSet
fractionTrain
- Fraction (in range 0 to 1) of examples to be returned in the training DataSet objectpublic INDArray getFeaturesMaskArray()
DataSet
getFeaturesMaskArray
in interface DataSet
public void setFeaturesMaskArray(INDArray featuresMask)
DataSet
setFeaturesMaskArray
in interface DataSet
public INDArray getLabelsMaskArray()
DataSet
getLabelsMaskArray
in interface DataSet
public void setLabelsMaskArray(INDArray labelsMask)
DataSet
setLabelsMaskArray
in interface DataSet
public boolean hasMaskArrays()
DataSet
hasMaskArrays
in interface DataSet
public long getMemoryFootprint()
getMemoryFootprint
in interface DataSet
public void migrate()
DataSet
public void detach()
DataSet
public boolean isEmpty()
public MultiDataSet toMultiDataSet()
toMultiDataSet
in interface DataSet
Copyright © 2020. All rights reserved.