Bayesian regularization backpropagation
net.trainFcn = 'trainbr'
[net,tr] = train(net,...)
trainbr is a network training function that
updates the weight and bias values according to Levenberg-Marquardt
optimization. It minimizes a combination of squared errors and weights,
and then determines the correct combination so as to produce a network
that generalizes well. The process is called Bayesian regularization.
net.trainFcn = 'trainbr' sets the network
[net,tr] = train(net,...) trains the network
Training occurs according to
parameters, shown here with their default values:
Maximum number of epochs to train
Marquardt adjustment parameter
Decrease factor for
Increase factor for
Maximum value for
Maximum validation failures
Minimum performance gradient
Epochs between displays (
Generate command-line output
Show training GUI
Maximum time to train in seconds
Validation stops are disabled by default (
= 0) so that training can continue until an optimal combination
of errors and weights is found. However, some weight/bias minimization
can still be achieved with shorter training times if validation is
enabled by setting
max_fail to 6 or some other
strictly positive value.
You can create a standard network that uses
To prepare a custom network to be trained with
to desired values.
In either case, calling
train with the resulting
network trains the network with
Here is a problem consisting of inputs
t to be solved with a network. It involves
fitting a noisy sine wave.
p = [-1:.05:1]; t = sin(2*pi*p)+0.1*randn(size(p));
A feed-forward network is created with a hidden layer of 2 neurons.
net = feedforwardnet(2,'trainbr');
Here the network is trained and tested.
net = train(net,p,t); a = net(p)
This function uses the Jacobian for calculations, which assumes
that performance is a mean or sum of squared errors. Therefore networks
trained with this function must use either the
trainbr can train any network as long as
its weight, net input, and transfer functions have derivative functions.
Bayesian regularization minimizes a linear combination of squared errors and weights. It also modifies the linear combination so that at the end of training the resulting network has good generalization qualities. See MacKay (Neural Computation, Vol. 4, No. 3, 1992, pp. 415 to 447) and Foresee and Hagan (Proceedings of the International Joint Conference on Neural Networks, June, 1997) for more detailed discussions of Bayesian regularization.
This Bayesian regularization takes place within the Levenberg-Marquardt
algorithm. Backpropagation is used to calculate the Jacobian
perf with respect to the weight and
X. Each variable is adjusted according
jj = jX * jX je = jX * E dX = -(jj+I*mu) \ je
E is all errors and
the identity matrix.
The adaptive value
mu is increased by
the change shown above results in a reduced performance value. The
change is then made to the network, and
mu is decreased
mem_reduc indicates how to
use memory and speed to calculate the Jacobian
the fastest, but can require a lot of memory. Increasing
some of the memory required by a factor of two, but slows
Higher values continue to decrease the amount of memory needed and
increase the training times.
Training stops when any of these conditions occurs:
The maximum number of
The maximum amount of
time is exceeded.
Performance is minimized to the
The performance gradient falls below
MacKay, Neural Computation, Vol. 4, No. 3, 1992, pp. 415–447
Foresee and Hagan, Proceedings of the International Joint Conference on Neural Networks, June, 1997