NeuralNetwork

java.lang.Object
- smile.classification.NeuralNetwork

All Implemented Interfaces:

java.io.Serializable, Classifier<double[]>, OnlineClassifier<double[]>, SoftClassifier<double[]>
```
public class NeuralNetwork
extends java.lang.Object
implements OnlineClassifier<double[]>, SoftClassifier<double[]>, java.io.Serializable
```
Multilayer perceptron neural network. An MLP consists of several layers of nodes, interconnected through weighted acyclic arcs from each preceding layer to the following, without lateral or feedback connections. Each node calculates a transformed weighted linear combination of its inputs (output activations from the preceding layer), with one of the weights acting as a trainable bias connected to a constant input. The transformation, called activation function, is a bounded non-decreasing (non-linear) function, such as the sigmoid functions (ranges from 0 to 1). Another popular activation function is hyperbolic tangent which is actually equivalent to the sigmoid function in shape but ranges from -1 to 1. More specialized activation functions include radial basis functions which are used in RBF networks.
The representational capabilities of a MLP are determined by the range of mappings it may implement through weight variation. Single layer perceptrons are capable of solving only linearly separable problems. With the sigmoid function as activation function, the single-layer network is identical to the logistic regression model.
The universal approximation theorem for neural networks states that every continuous function that maps intervals of real numbers to some output interval of real numbers can be approximated arbitrarily closely by a multi-layer perceptron with just one hidden layer. This result holds only for restricted classes of activation functions, which are extremely complex and NOT smooth for subtle mathematical reasons. On the other hand, smoothness is important for gradient descent learning. Besides, the proof is not constructive regarding the number of neurons required or the settings of the weights. Therefore, complex systems will have more layers of neurons with some having increased layers of input neurons and output neurons in practice.
The most popular algorithm to train MLPs is back-propagation, which is a gradient descent method. Based on chain rule, the algorithm propagates the error back through the network and adjusts the weights of each connection in order to reduce the value of the error function by some small amount. For this reason, back-propagation can only be applied on networks with differentiable activation functions.
During error back propagation, we usually times the gradient with a small number η, called learning rate, which is carefully selected to ensure that the network converges to a local minimum of the error function fast enough, without producing oscillations. One way to avoid oscillation at large η, is to make the change in weight dependent on the past weight change by adding a momentum term.
Although the back-propagation algorithm may performs gradient descent on the total error of all instances in a batch way, the learning rule is often applied to each instance separately in an online way or stochastic way. There exists empirical indication that the stochastic way results in faster convergence.
In practice, the problem of over-fitting has emerged. This arises in convoluted or over-specified systems when the capacity of the network significantly exceeds the needed free parameters. There are two general approaches for avoiding this problem: The first is to use cross-validation and similar techniques to check for the presence of over-fitting and optimally select hyper-parameters such as to minimize the generalization error. The second is to use some form of regularization, which emerges naturally in a Bayesian framework, where the regularization can be performed by selecting a larger prior probability over simpler models; but also in statistical learning theory, where the goal is to minimize over the "empirical risk" and the "structural risk".
For neural networks, the input patterns usually should be scaled/standardized. Commonly, each input variable is scaled into interval [0, 1] or to have mean 0 and standard deviation 1.
For penalty functions and output units, the following natural pairings are recommended:
- linear output units and a least squares penalty function.
- a two-class cross-entropy penalty function and a logistic activation function.
- a multi-class cross-entropy penalty function and a softmax activation function.
By assigning a softmax activation function on the output layer of the neural network for categorical target variables, the outputs can be interpreted as posterior probabilities, which are very useful.
See Also:

Serialized Form

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`NeuralNetwork.ActivationFunction` The types of activation functions in output layer.
`static class`	`NeuralNetwork.ErrorFunction` The types of error functions.
`static class`	`NeuralNetwork.Trainer` Trainer for neural networks.

Constructor Summary

Constructors
Constructor and Description
`NeuralNetwork(NeuralNetwork.ErrorFunction error, int... numUnits)` Constructor.
`NeuralNetwork(NeuralNetwork.ErrorFunction error, NeuralNetwork.ActivationFunction activation, int... numUnits)` Constructor.

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method and Description
`NeuralNetwork`	`clone()`
`double`	`getLearningRate()` Returns the learning rate.
`double`	`getMomentum()` Returns the momentum factor.
`double`	`getWeightDecay()` Returns the weight decay factor.
`void`	`learn(double[][] x, int[] y)` Trains the neural network with the given dataset for one epoch by stochastic gradient descent.
`double`	`learn(double[] x, double[] y, double weight)` Update the neural network with given instance and associated target value.
`void`	`learn(double[] x, int y)` Online update the classifier with a new training instance.
`void`	`learn(double[] x, int y, double weight)` Online update the neural network with a new training instance.
`int`	`predict(double[] x)` Predict the class of a given instance.
`int`	`predict(double[] x, double[] y)` Predict the target value of a given instance.
`void`	`setLearningRate(double eta)` Sets the learning rate.
`void`	`setMomentum(double alpha)` Sets the momentum factor.
`void`	`setWeightDecay(double lambda)` Sets the weight decay factor.

Methods inherited from class java.lang.Object
equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - NeuralNetwork
```
public NeuralNetwork(NeuralNetwork.ErrorFunction error,
                     int... numUnits)
```
    Constructor. The activation function of output layer will be chosen by natural pairing based on the error function and the number of classes.
    
    Parameters:
    
    error - the error function.
    
    numUnits - the number of units in each layer.
  - NeuralNetwork
```
public NeuralNetwork(NeuralNetwork.ErrorFunction error,
                     NeuralNetwork.ActivationFunction activation,
                     int... numUnits)
```
    Constructor.
    
    Parameters:
    
    error - the error function.
    
    activation - the activation function of output layer.
    
    numUnits - the number of units in each layer.
- Method Detail
  - clone
```
public NeuralNetwork clone()
```
    Overrides:
    
    clone in class java.lang.Object
  - setLearningRate
```
public void setLearningRate(double eta)
```
    Sets the learning rate.
    
    Parameters:
    
    eta - the learning rate.
  - getLearningRate
```
public double getLearningRate()
```
    Returns the learning rate.
  - setMomentum
```
public void setMomentum(double alpha)
```
    Sets the momentum factor.
    
    Parameters:
    
    alpha - the momentum factor.
  - getMomentum
```
public double getMomentum()
```
    Returns the momentum factor.
  - setWeightDecay
```
public void setWeightDecay(double lambda)
```
    Sets the weight decay factor. After each weight update, every weight is simply ''decayed'' or shrunk according w = w * (1 - eta * lambda).
    
    Parameters:
    
    lambda - the weight decay for regularization.
  - getWeightDecay
```
public double getWeightDecay()
```
    Returns the weight decay factor.
  - predict
```
public int predict(double[] x,
                   double[] y)
```
    Predict the target value of a given instance. Note that this method is NOT multi-thread safe.
    
    Specified by:
    
    predict in interface SoftClassifier<double[]>
    
    Parameters:
    
    x - the instance.
    
    y - the array to store network output on output. For softmax activation function, these are estimated posteriori probabilities.
    
    Returns:
    
    the predicted class label.
  - predict
```
public int predict(double[] x)
```
    Predict the class of a given instance. Note that this method is NOT multi-thread safe.
    
    Specified by:
    
    predict in interface Classifier<double[]>
    
    Parameters:
    
    x - the instance.
    
    Returns:
    
    the predicted class label.
  - learn
```
public double learn(double[] x,
                    double[] y,
                    double weight)
```
    Update the neural network with given instance and associated target value. Note that this method is NOT multi-thread safe.
    
    Parameters:
    
    x - the training instance.
    
    y - the target value.
    
    weight - a positive weight value associated with the training instance.
    
    Returns:
    
    the weighted training error before back-propagation.
  - learn
```
public void learn(double[] x,
                  int y)
```
    Description copied from interface: OnlineClassifier
    
    Online update the classifier with a new training instance. In general, this method may be NOT multi-thread safe.
    
    Specified by:
    
    learn in interface OnlineClassifier<double[]>
    
    Parameters:
    
    x - training instance.
    
    y - training label.
  - learn
```
public void learn(double[] x,
                  int y,
                  double weight)
```
    Online update the neural network with a new training instance. Note that this method is NOT multi-thread safe.
    
    Parameters:
    
    x - training instance.
    
    y - training label.
    
    weight - a positive weight value associated with the training instance.
  - learn
```
public void learn(double[][] x,
                  int[] y)
```
    Trains the neural network with the given dataset for one epoch by stochastic gradient descent.
    
    Parameters:
    
    x - training instances.
    
    y - training labels in [0, k), where k is the number of classes.

Class NeuralNetwork

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Detail

NeuralNetwork

NeuralNetwork

Method Detail

clone

setLearningRate

getLearningRate

setMomentum

getMomentum

setWeightDecay

getWeightDecay

predict

predict

learn

learn

learn

learn