package saddle
Saddle
Saddle is a Scala Data Library.
Saddle provides array-backed, indexed one- and two-dimensional data structures.
These data structures are specialized on JVM primitives. With them one can often avoid the overhead of boxing and unboxing.
Basic operations also aim to be robust to missing values (NA's)
The building blocks are intended to be easily composed.
The foundational building blocks are:
Inspiration for Saddle comes from many sources, including the R programming language, the pandas data analysis library for Python, and the Scala collections library.
- Alphabetic
- By Inheritance
- saddle
- AnyRef
- Any
- Hide All
- Show All
- Public
- Protected
Package Members
- package array
This package contains utilities for working with arrays that are specialized for numeric types.
- package csv
- package groupby
- package index
- package locator
- package mat
- package npy
- package ops
Provides type aliases for a few basic operations
- package scalar
- package util
Additional utilities that need a home
- package vec
Factory methods to generate Vec instances
Type Members
- implicit class ArrToVec[T] extends AnyRef
- final class Buffer[V] extends AnyRef
- type CLM[C] = ClassTag[C]
Shorthand for class manifest typeclass
- abstract class FillMethod extends AnyRef
Filling method for NA values.
Filling method for NA values. Non-sealed because could add more variants in the future.
- class Frame[RX, CX, T] extends NumericOps[Frame[RX, CX, T]]
Frame
is an immutable container for 2D data which is indexed along both axes (rows, columns) by associated keys (i.e., indexes).Frame
is an immutable container for 2D data which is indexed along both axes (rows, columns) by associated keys (i.e., indexes).The primary use case is homogeneous data, but a secondary concern is to support heterogeneous data that is homogeneous ony within any given column.
The row index, column index, and constituent value data are all backed ultimately by arrays.
Frame
is effectively a doubly-indexed associative map whose row keys and col keys each have an ordering provided by the natural (provided) order of their backing arrays.Several factory and access methods are provided. In the following examples, assume that:
val f = Frame('a'->Vec(1,2,3), 'b'->Vec(4,5,6))
The
apply
method takes a row and col key returns a slice of the original Frame:f(0,'a') == Frame('a'->Vec(1))
apply
also accepts a org.saddle.index.Slice:f(0->1, 'b') == Frame('b'->Vec(4,5)) f(0, *) == Frame('a'->Vec(1), 'b'->Vec(4))
You may slice using the
col
androw
methods respectively, as follows:f.col('a') == Frame('a'->Vec(1,2,3)) f.row(0) == Frame('a'->Vec(1), 'b'->Vec(4)) f.row(0->1) == Frame('a'->Vec(1,2), 'b'->Vec(4,5))
You can achieve a similar effect with
rowSliceBy
andcolSliceBy
The
colAt
androwAt
methods take an integer offset i into the Frame, and return a Series indexed by the opposing axis:f.rowAt(0) == Series('a'->1, 'b'->4)
If there is a one-to-one relationship between offset i and key (ie, no duplicate keys in the index), you may achieve the same effect via key as follows:
f.first(0) == Series('a'->1, 'b'->4) f.firstCol('a') == Series(1,2,3)
The
at
method returns an instance of a org.saddle.scalar.Scalar, which behaves much like anOption
; it can be either an instance of org.saddle.scalar.NA or a org.saddle.scalar.Value case class:f.at(0, 0) == scalar.Scalar(1)
The
rowSlice
andcolSlice
methods allows slicing the Frame for locations in [i, j) irrespective of the value of the keys at those locations.f.rowSlice(0,1) == Frame('a'->Vec(1), 'b'->Vec(4))
Finally, the method
raw
accesses a value directly, which may reveal the underlying representation of a missing value (so be careful).f.raw(0,0) == 1
Frame
may be used in arithmetic expressions which operate on twoFrame
s or on aFrame
and a scalar value. In the former case, the two Frames will automatically align along their indexes:f + f.shift(1) == Frame('a'->Vec(NA,3,5), 'b'->Vec(NA,9,11))
- RX
The type of row keys
- CX
The type of column keys
- T
The type of entries in the frame
- trait Index[T] extends AnyRef
Index provides a constant-time look-up of a value within array-backed storage, as well as operations to support joining and slicing.
- final class Mat[T] extends NumericOps[Mat[T]]
Mat
is an immutable container for 2D homogeneous data (a "matrix").Mat
is an immutable container for 2D homogeneous data (a "matrix"). It is backed by a single array. Data is stored in row-major order.Several element access methods are provided.
The
at
method returns an instance of a org.saddle.scalar.Scalar, which behaves much like anOption
in that it can be either an instance of org.saddle.scalar.NA or a org.saddle.scalar.Value case class:val m = Mat(2,2,Array(1,2,3,4)) m.at(0,0) == Value(1)
The method
raw
accesses the underlying value directly.val m = Mat(2,2,Array(1,2,3,4)) m.raw(0,0) == 1d
Mat
may be used in arithmetic expressions which operate on twoMat
s or on aMat
and a primitive value. A fe examples:val m = Mat(2,2,Array(1,2,3,4)) m * m == Mat(2,2,Array(1,4,9,16)) m dot m == Mat(2,2,Array(7d,10,15,22)) m * 3 == Mat(2, 2, Array(3,6,9,12))
Note, Mat is generally compatible with EJML's DenseMatrix. It may be convenient to induce this conversion to do more complex linear algebra, or to work with a mutable data structure.
- type NUM[C] = Numeric[C]
Shorthand for numeric typeclass
- trait Numeric[T] extends ORD[T]
- type ORD[C] = Order[C]
Shorthand for ordering typeclass
- implicit class OptionToScalar[T] extends AnyRef
- sealed trait PctMethod extends AnyRef
Trait which specifies what percentile method to use
- implicit class PrimitiveToScalar[T] extends AnyRef
- sealed trait RankTie extends AnyRef
Trait which specifies how to break a rank tie
- type ST[C] = ScalarTag[C]
Shorthand for scalar tag typeclass
- implicit class SeqToFrame[RX, CX, T] extends AnyRef
Augments Seq with a toFrame method that returns a new Frame instance.
Augments Seq with a toFrame method that returns a new Frame instance.
For example,
val t = IndexedSeq(("a", "x", 3), ("b", "y", 4)) val f = t.toFrame res0: org.saddle.Frame[java.lang.String,java.lang.String,Int] = [2 x 2] x y -- -- a -> 3 NA b -> NA 4
- RX
Type of row index elements of Frame
- CX
Type of col index elements of Frame
- T
Type of data elements of Frame
- implicit class SeqToFrame2[RX, CX, T] extends AnyRef
- implicit class SeqToIndex[X] extends AnyRef
Augments Seq with a toIndex method that returns a new Index instance.
Augments Seq with a toIndex method that returns a new Index instance.
For example,
val i = IndexedSeq(1,2,3) val s = i.toIndex
- X
Type of index elements
- implicit class SeqToMat[T] extends AnyRef
- implicit class SeqToSeries[T, X] extends AnyRef
Augments Seq with a toSeries method that returns a new Series instance.
Augments Seq with a toSeries method that returns a new Series instance.
For example,
val p = IndexedSeq(1,2,3) zip IndexedSeq(4,5,6) val s = p.toSeries
- T
Type of data elements of Series
- X
Type of index elements of Series
- implicit class SeqToVec[T] extends AnyRef
Augments Seq with a toVec method that returns a new Vec instance.
Augments Seq with a toVec method that returns a new Vec instance.
For example,
val s = IndexedSeq(1,2,3) val v = s.toVec
- T
Type of elements of Vec
- class Series[X, T] extends NumericOps[Series[X, T]]
Series
is an immutable container for 1D homogeneous data which is indexed by a an associated sequence of keys.Series
is an immutable container for 1D homogeneous data which is indexed by a an associated sequence of keys.Both the index and value data are backed by arrays.
Series
is effectively an associative map whose keys have an ordering provided by the natural (provided) order of the backing array.Several element access methods are provided.
The
apply
method returns a slice of the original Series:val s = Series(Vec(1,2,3,4), Index('a','b','b','c')) s('a') == Series('a'->1) s('b') == Series('b'->2, 'b'->3)
Other ways to slice a series involve implicitly constructing an org.saddle.index.Slice object and passing it to the Series apply method:
s('a'->'b') == Series('a'->1, 'b'->2, 'b'->3) s(* -> 'b') == Series('a'->1, 'b'->2, 'b'->3) s('b' -> *) == Series('b'->2, 'b'->3, 'c'->4) s(*) == s
The
at
method returns an instance of a org.saddle.scalar.Scalar, which behaves much like anOption
in that it can be either an instance of org.saddle.scalar.NA or a org.saddle.scalar.Value case class:s.at(0) == Scalar(1)
The
slice
method allows slicing the Series for locations in [i, j) irrespective of the value of the keys at those locations.s.slice(2,4) == Series('b'->3, 'c'->4)
To slice explicitly by labels, use the
sliceBy
method, which is inclusive of the key boundaries:s.sliceBy('b','c') == Series('b'->3, 'c'->4)
The method
raw
accesses the value directly, which may reveal the underlying representation of a missing value (so be careful).s.raw(0) == 1
Series
may be used in arithmetic expressions which operate on twoSeries
or on aSeries
and a scalar value. In the former case, the two Series will automatically align along their indexes. A few examples:s * 2 == Series('a'->2, 'b'->4, ... ) s + s.shift(1) == Series('a'->NA, 'b'->3, 'b'->5, ...)
- X
Type of elements in the index, for which there must be an implicit Ordering and ST
- T
Type of elements in the values array, for which there must be an implicit ST
- trait Vec[T] extends NumericOps[Vec[T]]
Vec
is an immutable container for 1D homogeneous data (a "vector").Vec
is an immutable container for 1D homogeneous data (a "vector"). It is backed by an array and indexed from 0 to length - 1.Several element access methods are provided.
The
apply()
method returns a slice of the original vector:val v = Vec(1,2,3,4) v(0) == Vec(1) v(1, 2) == Vec(2,3)
The
at
method returns an instance of a org.saddle.scalar.Scalar, which behaves much like anOption
in that it can be either an instance of org.saddle.scalar.NA or a org.saddle.scalar.Value case class:Vec[Int](1,2,3,na).at(0) == Scalar(1) Vec[Int](1,2,3,na).at(3) == NA
The method
raw
accesses the underlying value directly.Vec(1d,2,3).raw(0) == 1d
Vec
may be used in arithmetic expressions which operate on twoVec
s or on aVec
and a scalar value. A few examples:Vec(1,2,3,4) + Vec(2,3,4,5) == Vec(3,5,7,9) Vec(1,2,3,4) * 2 == Vec(2,4,6,8)
Note, Vec is implicitly convertible to an array for convenience; this could be abused to mutate the contents of the Vec. Try to avoid this!
- T
Type of elements within the Vec
- implicit class VecDoubleOps extends AnyRef
Specialized methods for Vec[Double]
Specialized methods for Vec[Double]
Methods in this class do not filter out NAs, e.g. Vec(NA,1d).max2 == NA rather than 1d
Value Members
- def *: SliceAll
Syntactic sugar, placeholder for 'slice-all'
Syntactic sugar, placeholder for 'slice-all'
val v = Vec(1,2,3, 4) val u = v(*)
- implicit def any2Slice[T](p: T): SliceDefault[T]
- def clock[T](op: => T): (Double, T)
Allow timing of an operation
Allow timing of an operation
clock { bigMat.T dot bigMat }
- def concat[T](vecs: IndexedSeq[Vec[T]])(implicit arg0: ST[T]): Vec[T]
- implicit val doubleOrd: doubleIsNumeric.type
- implicit val floatOrd: floatIsNumeric.type
- implicit val intOrd: intIsNumeric.type
- implicit val longOrd: longIsNumeric.type
- def na[T](implicit st: ST[T]): T
na
provides syntactic sugar for constructing primitives recognized as NA.na
provides syntactic sugar for constructing primitives recognized as NA. A use case is be:Vec[Int](1,2,na,4)
The NA bit pattern for integral types is
MinValue
because it induces a symmetry on the remaining bound of values; e.g. the remainingByte
bound is (-127, +127). - implicit def pair2Slice[T](p: (T, T)): SliceDefault[T]
Syntactic sugar, allow '->' to generate an (inclusive) index slice
Syntactic sugar, allow '->' to generate an (inclusive) index slice
val v = Vec(1,2,3,4) val u = v(0 -> 2)
- implicit def pair2SliceFrom[T](p: (T, SliceAll)): SliceFrom[T]
Syntactic sugar, allow ' -> *' to generate an (inclusive) index slice, open on right
Syntactic sugar, allow ' -> *' to generate an (inclusive) index slice, open on right
val v = Vec(1,2,3,4) val u = v(1 -> *)
- implicit def pair2SliceTo[T](p: (SliceAll, T)): SliceTo[T]
Syntactic sugar, allow '* -> ' to generate an (inclusive) index slice, open on left
Syntactic sugar, allow '* -> ' to generate an (inclusive) index slice, open on left
val v = Vec(1,2,3,4) val u = v(* -> 2)
- object Buffer
- case object FillBackward extends FillMethod with Product with Serializable
- case object FillForward extends FillMethod with Product with Serializable
- object Frame extends BinOpFrame
- object Index
- object Mat
- object PctMethod
- object RankTie
- object Series extends BinOpSeries
- object Vec
- object doubleIsNumeric extends Numeric[Double] with DoubleTotalOrderTrait
- object floatIsNumeric extends Numeric[Float] with FloatTotalOrderTrait
- object intIsNumeric extends Numeric[Int]
- object longIsNumeric extends Numeric[Long]
- object order extends OrderInstances