dataset

Type Members

trait AbstractDataSet[D, DataSequence] extends AnyRef

A set of data which is used in the model optimization process.
A set of data which is used in the model optimization process. The dataset can be access in a random data sample sequence. In the training process, the data sequence is a looped endless sequence. While in the validation process, the data sequence is a limited length sequence. User can use the data() method to get the data sequence.
The sequence of the data is not fixed. It can be changed by the shuffle() method.
User can create a dataset from a RDD, an array and a folder, etc. The DataSet object provides many factory methods.
D
Data type
DataSequence
Represent a sequence of data
case class ByteRecord(data: Array[Byte], label: Float) extends Product with Serializable

A byte array and a label.
A byte array and a label. It can contain anything.
class CachedDistriDataSet[T] extends DistributedDataSet[T]

Wrap a RDD as a DataSet.
class ChainedTransformer[A, B, C] extends Transformer[A, C]

A transformer chain two transformer together.
A transformer chain two transformer together. The output type of the first transformer should be same with the input type of the second transformer.
A
input type of the first transformer
B
output type of the first transformer, as well as the input type of the last transformer
C
output of the last transformer
trait DistributedDataSet[T] extends AbstractDataSet[T, RDD[T]]

Represent a distributed data.
Represent a distributed data. Use RDD to go through all data.
class Identity[A] extends Transformer[A, A]

Just transform the input to output.
abstract class Image extends Serializable

Represent an image
trait Label[T] extends AnyRef

Represent a label
class LocalArrayDataSet[T] extends LocalDataSet[T]

Wrap an array as a DataSet.
trait LocalDataSet[T] extends AbstractDataSet[T, Iterator[T]]

Manage some 'local' data, e.g.
Manage some 'local' data, e.g. data in files or memory. We use iterator to go through the data.
class LocalImagePath extends AnyRef

Represent a local file path of an image file
case class LocalSeqFilePath(path: Path) extends Product with Serializable

Represent a local file path of a hadoop sequence file
case class MiniBatch[T](data: Tensor[T], labels: Tensor[T]) extends Product with Serializable

A batch of data feed into the model.
A batch of data feed into the model. The first size is batchsize
class Sample[T] extends Serializable

Sample, bundling input and target
class SampleToBatch[T] extends Transformer[Sample[T], MiniBatch[T]]

Convert a sequence of Sample to a sequence of MiniBatch, optionally padding all the features (or labels) in the mini-batch to the same length
abstract class Sentence[T] extends Serializable

Represent a sentence
trait Transformer[A, B] extends Serializable

Transform a data stream of type A to type B.
Transform a data stream of type A to type B. It is usually used in data pre-process stage. Different transformers can compose a pipeline. For example, if there're transformer1 from A to B, transformer2 from B to C, and transformer3 from C to D, you can compose them into a bigger transformer from A to D by transformer1 -> transformer2 -> transformer 3.
The purpose of transformer is for code reuse. Many deep learning share many common data pre-process steps. User needn't write them every time, but can reuse others work.
Transformer can be used with RDD(rdd.mapPartition), iterator and DataSet.

Value Members

object DataSet

Common used DataSet builder.
object Identity extends Serializable
object Sample extends Serializable
object SampleToBatch extends Serializable

Convert a sequence of Sample to a sequence of MiniBatch, optionally padding all the features (or labels) in the mini-batch to the same length
object Utils
package image
package text

package dataset

Type Members

trait AbstractDataSet[D, DataSequence] extends AnyRef

case class ByteRecord(data: Array[Byte], label: Float) extends Product with Serializable

class CachedDistriDataSet[T] extends DistributedDataSet[T]

class ChainedTransformer[A, B, C] extends Transformer[A, C]

trait DistributedDataSet[T] extends AbstractDataSet[T, RDD[T]]

class Identity[A] extends Transformer[A, A]

abstract class Image extends Serializable

trait Label[T] extends AnyRef

class LocalArrayDataSet[T] extends LocalDataSet[T]

trait LocalDataSet[T] extends AbstractDataSet[T, Iterator[T]]

class LocalImagePath extends AnyRef

case class LocalSeqFilePath(path: Path) extends Product with Serializable

case class MiniBatch[T](data: Tensor[T], labels: Tensor[T]) extends Product with Serializable

class Sample[T] extends Serializable

class SampleToBatch[T] extends Transformer[Sample[T], MiniBatch[T]]

abstract class Sentence[T] extends Serializable

trait Transformer[A, B] extends Serializable

Value Members

object DataSet

object Identity extends Serializable

object Sample extends Serializable

object SampleToBatch extends Serializable

object Utils

package image

package text

Ungrouped