Split a DataSet into an array of TrainTest DataSets
Split a DataSet into an array of TrainTest DataSets
DataSet to be split
The number of TrainTest DataSets to be returns. Each 'testing' will be 1/k of the dataset, randomly sampled, the training will be the remainder of the dataset. The DataSet is split into kFolds first, so that no observation will occuring in multiple folds.
Random number generator seed.
An array of TrainTestDataSets
Split a DataSet by the probability fraction of each element of a vector.
Split a DataSet by the probability fraction of each element of a vector.
DataSet to be split
An array of PROPORTIONS for splitting the DataSet. Unlike the randomSplit function, number greater than 1 do not lead to over sampling. The number of splits is dictated by the length of this array. The number are normalized, eg. Array(1.0, 2.0) would yield two data sets with a 33/66% split.
Random number generator seed.
An array of DataSets whose length is equal to the length of fracArray
Split a DataSet by the probability fraction of each element.
Split a DataSet by the probability fraction of each element.
DataSet to be split
Probability that each element is chosen, should be [0,1] This fraction refers to the first element in the resulting array.
Sampling by default is random and can result in slightly lop-sided sample sets. When precise is true, equal sample set size are forced, however this is somewhat less efficient.
Random number generator seed.
An array of two datasets
A wrapper for multiRandomSplit that yields a TrainTestHoldoutDataSet
A wrapper for multiRandomSplit that yields a TrainTestHoldoutDataSet
DataSet to be split
A tuple of three doubles, where the first element specifies the size of the training set, the second element the testing set, and the third element is the holdout set. These are proportional and will be normalized internally.
Random number generator seed.
A TrainTestDataSet
A wrapper for randomSplit that yields a TrainTestDataSet
A wrapper for randomSplit that yields a TrainTestDataSet
DataSet to be split
Probability that each element is chosen, should be [0,1]. This fraction refers to the training element in TrainTestSplit
Sampling by default is random and can result in slightly lop-sided sample sets. When precise is true, equal sample set size are forced, however this is somewhat less efficient.
Random number generator seed.
A TrainTestDataSet