Bayesian regularization backpropagation

`net.trainFcn = 'trainbr'`

[net,tr] = train(net,...)

`trainbr`

is a network training function that updates the weight and bias
values according to Levenberg-Marquardt optimization. It minimizes a combination of squared
errors and weights, and then determines the correct combination so as to produce a network that
generalizes well. The process is called Bayesian regularization.

`net.trainFcn = 'trainbr'`

sets the network `trainFcn`

property.

`[net,tr] = train(net,...)`

trains the network with
`trainbr`

.

Training occurs according to `trainbr`

training parameters, shown here
with their default values:

`net.trainParam.epochs` | `1000` | Maximum number of epochs to train |

`net.trainParam.goal` | `0` | Performance goal |

`net.trainParam.mu` | `0.005` | Marquardt adjustment parameter |

`net.trainParam.mu_dec` | `0.1` | Decrease factor for |

`net.trainParam.mu_inc` | `10` | Increase factor for |

`net.trainParam.mu_max` | `1e10` | Maximum value for |

`net.trainParam.max_fail` | `inf` | Maximum validation failures |

`net.trainParam.min_grad` | `1e-7` | Minimum performance gradient |

`net.trainParam.show` | `25` | Epochs between displays ( |

`net.trainParam.showCommandLine` | `false` | Generate command-line output |

`net.trainParam.showWindow` | `true` | Show training GUI |

`net.trainParam.time` | `inf` | Maximum time to train in seconds |

Validation stops are disabled by default (`max_fail = inf`

) so that
training can continue until an optimal combination of errors and weights is found. However, some
weight/bias minimization can still be achieved with shorter training times if validation is
enabled by setting `max_fail`

to 6 or some other strictly positive
value.

You can create a standard network that uses `trainbr`

with
`feedforwardnet`

or `cascadeforwardnet`

. To prepare a custom
network to be trained with `trainbr`

,

Set

`NET.trainFcn`

to`'trainbr'`

. This sets`NET.trainParam`

to`trainbr`

’s default parameters.Set

`NET.trainParam`

properties to desired values.

In either case, calling `train`

with the resulting network trains the
network with `trainbr`

. See `feedforwardnet`

and
`cascadeforwardnet`

for examples.

Here is a problem consisting of inputs `p`

and targets
`t`

to be solved with a network. It involves fitting a noisy sine wave.

p = [-1:.05:1]; t = sin(2*pi*p)+0.1*randn(size(p));

A feed-forward network is created with a hidden layer of 2 neurons.

net = feedforwardnet(2,'trainbr');

Here the network is trained and tested.

net = train(net,p,t); a = net(p)

This function uses the Jacobian for calculations, which assumes that performance is a mean
or sum of squared errors. Therefore networks trained with this function must use either the
`mse`

or `sse`

performance function.

`trainbr`

can train any network as long as its weight, net input, and
transfer functions have derivative functions.

Bayesian regularization minimizes a linear combination of squared errors and weights. It
also modifies the linear combination so that at the end of training the resulting network has
good generalization qualities. See MacKay (*Neural Computation*, Vol. 4, No.
3, 1992, pp. 415 to 447) and Foresee and Hagan (*Proceedings of the International Joint
Conference on Neural Networks*, June, 1997) for more detailed discussions of Bayesian
regularization.

This Bayesian regularization takes place within the Levenberg-Marquardt algorithm.
Backpropagation is used to calculate the Jacobian `jX`

of performance
`perf`

with respect to the weight and bias variables `X`

.
Each variable is adjusted according to Levenberg-Marquardt,

jj = jX * jX je = jX * E dX = -(jj+I*mu) \ je

where `E`

is all errors and `I`

is the identity
matrix.

The adaptive value `mu`

is increased by `mu_inc`

until
the change shown above results in a reduced performance value. The change is then made to the
network, and `mu`

is decreased by `mu_dec`

.

Training stops when any of these conditions occurs:

The maximum number of

`epochs`

(repetitions) is reached.The maximum amount of

`time`

is exceeded.Performance is minimized to the

`goal`

.The performance gradient falls below

`min_grad`

.`mu`

exceeds`mu_max`

.

[1] MacKay, David J. C. "Bayesian interpolation." *Neural
computation.* Vol. 4, No. 3, 1992, pp. 415–447.

[2] Foresee, F. Dan, and Martin T. Hagan. "Gauss-Newton approximation to Bayesian
learning." *Proceedings of the International Joint Conference on Neural
Networks*, June, 1997.

`cascadeforwardnet`

| `feedforwardnet`

| `trainbfg`

| `traincgb`

| `traincgf`

| `traincgp`

| `traingda`

| `traingdm`

| `traingdx`

| `trainlm`

| `trainrp`

| `trainscg`