A simple implementation of N-dimensional dual numbers, for automatically
computing exact derivatives of functions. This code (and documentation) closely follow
the one in Google's "Ceres" library of non-linear least-squares solvers
(see Sameer Agarwal, Keir Mierle, and others: Ceres Solver.)
While a complete treatment of the mechanics of automatic differentiation is
beyond the scope of this header (see
http://en.wikipedia.org/wiki/Automatic_differentiation for details), the
basic idea is to extend normal arithmetic with an extra element "h" such
that h != 0, but h2 = 0.
Dual numbers are extensions of the real numbers analogous to complex numbers:
whereas complex numbers augment the reals by introducing an imaginary unit i
such that i2 = -1, dual numbers introduce an "infinitesimal" unit h such
that h2 = 0.
Analogously to a complex number c = x + y*i, a dual number d = x * y*h has
two components: the "real" component x, and an "infinitesimal" component y.
Surprisingly, this leads to a convenient method for computing exact
derivatives without needing to manipulate complicated symbolic expressions.
For example, consider the function
f(x) = x * x ,
evaluated at 10. Using normal arithmetic, f(10) = 100, and df/dx(10) = 20.
Next, augment 10 with an infinitesimal h to get:
f(10 + h) = (10 + h) * (10 + h)
= 100 + 2 * 10 * h + h * h
= 100 + 20 * h +---
+----- |
| +--- This is zero
|
+----------------- This is df/dx
Note that the derivative of f with respect to x is simply the infinitesimal
component of the value of f(x + h). So, in order to take the derivative of
any function, it is only necessary to replace the numeric "object" used in
the function with one extended with infinitesimals. The class Jet, defined in
this header, is one such example of this, where substitution is done with
generics.
To handle derivatives of functions taking multiple arguments, different
infinitesimals are used, one for each variable to take the derivative of. For
example, consider a scalar function of two scalar parameters x and y:
f(x, y) = x * x + x * y
Following the technique above, to compute the derivatives df/dx and df/dy for
f(1, 3) involves doing two evaluations of f, the first time replacing x with
x + h, the second time replacing y with y + h.
To take the gradient of f with the implementation of dual numbers ("jets") in
this file, it is necessary to create a single jet type which has components
for the derivative in x and y, and pass them to a routine computing function f.
It is convenient to use a generic version of f, that can be called also with non-jet numbers
for standard evaluation:
def f[@specialized(Double) T : Field](x: T, y: T): T = x * x + x * y
val xValue = 9.47892774val yValue = 0.287740// The "2" means there should be 2 dual number components.implicitval dimension = JetDim(2)
val x: Jet[Double] = xValue + Jet.h[Double](0); // Pick the 0th dual number for x.val y: Jet[Double] = yValue + Jet.h[Double](1); // Pick the 1th dual number for y.val z: Jet[Double] = f(x, y);
println("df/dx = " + z.infinitesimal(0) + ", df/dy = " + z.infinitesimal(1));
For the more mathematically inclined, this file implements first-order
"jets". A 1st order jet is an element of the ring
T[N] = T[t_1, ..., t_N] / (t_1, ..., t_N)^2
which essentially means that each jet consists of a "scalar" value 'a' from T
and a 1st order perturbation vector 'v' of length N:
x = a + \sum_i v[i] t_i
A shorthand is to write an element as x = a + u, where u is the perturbation.
Then, the main point about the arithmetic of jets is that the product of
perturbations is zero:
(a + u) * (b + v) = ab + av + bu + uv
= ab + (av + bu) + 0
which is what operator* implements below. Addition is simpler:
(a + u) + (b + v) = (a + b) + (u + v).
The only remaining question is how to evaluate the function of a jet, for
which we use the chain rule:
f(a + u) = f(a) + f'(a) u
where f'(a) is the (scalar) derivative of f at a.
By pushing these things through generics, we can write routines that at same time
evaluate mathematical functions and compute their derivatives through automatic differentiation.
Linear Supertypes
Serializable, Serializable, JetInstances, AnyRef, Any
Overview
A simple implementation of N-dimensional dual numbers, for automatically computing exact derivatives of functions. This code (and documentation) closely follow the one in Google's "Ceres" library of non-linear least-squares solvers (see Sameer Agarwal, Keir Mierle, and others: Ceres Solver.)
While a complete treatment of the mechanics of automatic differentiation is beyond the scope of this header (see http://en.wikipedia.org/wiki/Automatic_differentiation for details), the basic idea is to extend normal arithmetic with an extra element "h" such that h != 0, but h2 = 0. Dual numbers are extensions of the real numbers analogous to complex numbers: whereas complex numbers augment the reals by introducing an imaginary unit i such that i2 = -1, dual numbers introduce an "infinitesimal" unit h such that h2 = 0. Analogously to a complex number c = x + y*i, a dual number d = x * y*h has two components: the "real" component x, and an "infinitesimal" component y. Surprisingly, this leads to a convenient method for computing exact derivatives without needing to manipulate complicated symbolic expressions.
For example, consider the function
evaluated at 10. Using normal arithmetic, f(10) = 100, and df/dx(10) = 20. Next, augment 10 with an infinitesimal h to get:
Note that the derivative of f with respect to x is simply the infinitesimal component of the value of f(x + h). So, in order to take the derivative of any function, it is only necessary to replace the numeric "object" used in the function with one extended with infinitesimals. The class Jet, defined in this header, is one such example of this, where substitution is done with generics.
To handle derivatives of functions taking multiple arguments, different infinitesimals are used, one for each variable to take the derivative of. For example, consider a scalar function of two scalar parameters x and y:
Following the technique above, to compute the derivatives df/dx and df/dy for f(1, 3) involves doing two evaluations of f, the first time replacing x with x + h, the second time replacing y with y + h.
For df/dx:
For df/dy:
To take the gradient of f with the implementation of dual numbers ("jets") in this file, it is necessary to create a single jet type which has components for the derivative in x and y, and pass them to a routine computing function f. It is convenient to use a generic version of f, that can be called also with non-jet numbers for standard evaluation:
For the more mathematically inclined, this file implements first-order "jets". A 1st order jet is an element of the ring
T[N] = T[t_1, ..., t_N] / (t_1, ..., t_N)^2
which essentially means that each jet consists of a "scalar" value 'a' from T and a 1st order perturbation vector 'v' of length N:
A shorthand is to write an element as x = a + u, where u is the perturbation. Then, the main point about the arithmetic of jets is that the product of perturbations is zero:
(a + u) * (b + v) = ab + av + bu + uv = ab + (av + bu) + 0
which is what operator* implements below. Addition is simpler:
The only remaining question is how to evaluate the function of a jet, for which we use the chain rule:
f(a + u) = f(a) + f'(a) u
where f'(a) is the (scalar) derivative of f at a.
By pushing these things through generics, we can write routines that at same time evaluate mathematical functions and compute their derivatives through automatic differentiation.