Package

com.intel.analytics.bigdl

dataset

Permalink

package dataset

Visibility
  1. Public
  2. All

Type Members

  1. trait AbstractDataSet[D, DataSequence] extends AnyRef

    Permalink

    A set of data which is used in the model optimization process.

    A set of data which is used in the model optimization process. The dataset can be access in a random data sample sequence. In the training process, the data sequence is a looped endless sequence. While in the validation process, the data sequence is a limited length sequence. User can use the data() method to get the data sequence.

    The sequence of the data is not fixed. It can be changed by the shuffle() method.

    User can create a dataset from a RDD, an array and a folder, etc. The DataSet object provides many factory methods.

    D

    Data type

    DataSequence

    Represent a sequence of data

  2. case class ByteRecord(data: Array[Byte], label: Float) extends Product with Serializable

    Permalink

    A byte array and a label.

    A byte array and a label. It can contain anything.

  3. class CachedDistriDataSet[T] extends DistributedDataSet[T]

    Permalink

    Wrap a RDD as a DataSet.

  4. class ChainedTransformer[A, B, C] extends Transformer[A, C]

    Permalink

    A transformer chain two transformer together.

    A transformer chain two transformer together. The output type of the first transformer should be same with the input type of the second transformer.

    A

    input type of the first transformer

    B

    output type of the first transformer, as well as the input type of the last transformer

    C

    output of the last transformer

  5. trait DistributedDataSet[T] extends AbstractDataSet[T, RDD[T]]

    Permalink

    Represent a distributed data.

    Represent a distributed data. Use RDD to go through all data.

  6. class Identity[A] extends Transformer[A, A]

    Permalink

    Just transform the input to output.

  7. abstract class Image extends Serializable

    Permalink

    Represent an image

  8. trait Label[T] extends AnyRef

    Permalink

    Represent a label

  9. class LocalArrayDataSet[T] extends LocalDataSet[T]

    Permalink

    Wrap an array as a DataSet.

  10. trait LocalDataSet[T] extends AbstractDataSet[T, Iterator[T]]

    Permalink

    Manage some 'local' data, e.g.

    Manage some 'local' data, e.g. data in files or memory. We use iterator to go through the data.

  11. class LocalImagePath extends AnyRef

    Permalink

    Represent a local file path of an image file

  12. case class LocalSeqFilePath(path: Path) extends Product with Serializable

    Permalink

    Represent a local file path of a hadoop sequence file

  13. case class MiniBatch[T](data: Tensor[T], labels: Tensor[T]) extends Product with Serializable

    Permalink

    A batch of data feed into the model.

    A batch of data feed into the model. The first size is batchsize

  14. class Sample[T] extends Serializable

    Permalink

    Sample, bundling input and target

  15. class SampleToBatch[T] extends Transformer[Sample[T], MiniBatch[T]]

    Permalink

    Convert a sequence of Sample to a sequence of MiniBatch, optionally padding all the features (or labels) in the mini-batch to the same length

  16. abstract class Sentence[T] extends Serializable

    Permalink

    Represent a sentence

  17. trait Transformer[A, B] extends Serializable

    Permalink

    Transform a data stream of type A to type B.

    Transform a data stream of type A to type B. It is usually used in data pre-process stage. Different transformers can compose a pipeline. For example, if there're transformer1 from A to B, transformer2 from B to C, and transformer3 from C to D, you can compose them into a bigger transformer from A to D by transformer1 -> transformer2 -> transformer 3.

    The purpose of transformer is for code reuse. Many deep learning share many common data pre-process steps. User needn't write them every time, but can reuse others work.

    Transformer can be used with RDD(rdd.mapPartition), iterator and DataSet.

Value Members

  1. object DataSet

    Permalink

    Common used DataSet builder.

  2. object Identity extends Serializable

    Permalink
  3. object Sample extends Serializable

    Permalink
  4. object SampleToBatch extends Serializable

    Permalink

    Convert a sequence of Sample to a sequence of MiniBatch, optionally padding all the features (or labels) in the mini-batch to the same length

  5. object Utils

    Permalink
  6. package image

    Permalink
  7. package text

    Permalink

Ungrouped