## Neural Network Training Concepts

This topic is part of the design workflow described in Workflow for Neural Network Design.

This topic describes two different styles of training. In *incremental* training
the weights and biases of the network are updated each time an input
is presented to the network. In *batch* training
the weights and biases are only updated after all the inputs are presented.
The batch training methods are generally more efficient in the MATLAB^{®} environment,
and they are emphasized in the Deep Learning Toolbox™ software,
but there some applications where incremental training can be useful,
so that paradigm is implemented as well.

### Incremental Training with adapt

Incremental training can be applied to both static and dynamic networks, although it is more commonly used with dynamic networks, such as adaptive filters. This section illustrates how incremental training is performed on both static and dynamic networks.

#### Incremental Training of Static Networks

Consider again the static network used for the first example.
You want to train it incrementally, so that the weights and biases
are updated after each input is presented. In this case you use the
function `adapt`

, and the inputs
and targets are presented as sequences.

Suppose you want to train the network to create the linear function:

$$t=2{p}_{1}+{p}_{2}$$

Then for the previous inputs,

$${p}_{1}=\left[\begin{array}{l}1\\ 2\end{array}\right],{p}_{2}=\left[\begin{array}{l}2\\ 1\end{array}\right],{p}_{3}=\left[\begin{array}{l}2\\ 3\end{array}\right],{p}_{4}=\left[\begin{array}{l}3\\ 1\end{array}\right]$$

the targets would be

$${t}_{1}=\left[4\right],{t}_{2}=\left[5\right],{t}_{3}=\left[7\right],{t}_{4}=\left[7\right]$$

For incremental training, you present the inputs and targets as sequences:

P = {[1;2] [2;1] [2;3] [3;1]}; T = {4 5 7 7};

First, set up the network with zero initial weights and biases. Also, set the initial learning rate to zero to show the effect of incremental training.

net = linearlayer(0,0); net = configure(net,P,T); net.IW{1,1} = [0 0]; net.b{1} = 0;

Recall from Simulation with Concurrent Inputs in a Static Network that,
for a static network, the simulation of the network produces the same
outputs whether the inputs are presented as a matrix of concurrent
vectors or as a cell array of sequential
vectors. However, this is not true when training the network. When
you use the `adapt`

function, if
the inputs are presented as a cell array of sequential vectors, then
the weights are updated as each input is presented (incremental mode).
As shown in the next section, if the inputs are presented as a matrix
of concurrent vectors, then the weights are updated only after all
inputs are presented (batch mode).

You are now ready to train the network incrementally.

[net,a,e,pf] = adapt(net,P,T);

The network outputs remain zero, because the learning rate is zero, and the weights are not updated. The errors are equal to the targets:

a = [0] [0] [0] [0] e = [4] [5] [7] [7]

If you now set the learning rate to 0.1 you can see how the network is adjusted as each input is presented:

net.inputWeights{1,1}.learnParam.lr = 0.1; net.biases{1,1}.learnParam.lr = 0.1; [net,a,e,pf] = adapt(net,P,T); a = [0] [2] [6] [5.8] e = [4] [3] [1] [1.2]

The first output is the same as it was with zero learning rate, because no update is made until the first input is presented. The second output is different, because the weights have been updated. The weights continue to be modified as each error is computed. If the network is capable and the learning rate is set correctly, the error is eventually driven to zero.

#### Incremental Training with Dynamic Networks

You can also train dynamic networks incrementally. In fact, this would be the most common situation.

To train the network incrementally, present the inputs and targets
as elements of cell arrays. Here are the initial input `Pi`

and
the inputs `P`

and targets `T`

as
elements of cell arrays.

Pi = {1}; P = {2 3 4}; T = {3 5 7};

Take the linear network with one delay at the input, as used in a previous example. Initialize the weights to zero and set the learning rate to 0.1.

net = linearlayer([0 1],0.1); net = configure(net,P,T); net.IW{1,1} = [0 0]; net.biasConnect = 0;

You want to train the network to create the current output by
summing the current and the previous inputs. This is the same input
sequence you used in the previous example with the exception that
you assign the first term in the sequence as the initial condition
for the delay. You can now sequentially train the network using `adapt`

.

[net,a,e,pf] = adapt(net,P,T,Pi); a = [0] [2.4] [7.98] e = [3] [2.6] [-0.98]

The first output is zero, because the weights have not yet been updated. The weights change at each subsequent time step.

### Batch Training

Batch training, in which weights and biases are only updated after all the inputs and targets are presented, can be applied to both static and dynamic networks. Both types of networks are discussed in this section.

#### Batch Training with Static Networks

Batch training can be done using either `adapt`

or `train`

, although `train`

is
generally the best option, because it typically has access to more
efficient training algorithms. Incremental training is usually done
with `adapt`

; batch training is
usually done with `train`

.

For batch training of a static network with `adapt`

, the input vectors must be placed
in one matrix of concurrent vectors.

P = [1 2 2 3; 2 1 3 1]; T = [4 5 7 7];

Begin with the static network used in previous examples. The learning rate is set to 0.01.

net = linearlayer(0,0.01); net = configure(net,P,T); net.IW{1,1} = [0 0]; net.b{1} = 0;

When you call `adapt`

, it
invokes `trains`

(the default adaption
function for the linear network) and `learnwh`

(the
default learning function for the weights and biases). `trains`

uses Widrow-Hoff learning.

[net,a,e,pf] = adapt(net,P,T); a = 0 0 0 0 e = 4 5 7 7

Note that the outputs of the network are all zero, because the weights are not updated until all the training set has been presented. If you display the weights, you find

net.IW{1,1} ans = 0.4900 0.4100 net.b{1} ans = 0.2300

This is different from the result after one pass of `adapt`

with incremental updating.

Now perform the same batch training using `train`

.
Because the Widrow-Hoff rule can be used in incremental or batch mode,
it can be invoked by `adapt`

or `train`

. (There are several algorithms that
can only be used in batch mode (e.g., Levenberg-Marquardt), so these
algorithms can only be invoked by `train`

.)

For this case, the input vectors can be in a matrix of concurrent
vectors or in a cell array of sequential vectors. Because the network
is static and because `train`

always
operates in batch mode, `train`

converts
any cell array of sequential vectors to a matrix of concurrent vectors.
Concurrent mode operation is used whenever possible because it has
a more efficient implementation in MATLAB code:

P = [1 2 2 3; 2 1 3 1]; T = [4 5 7 7];

The network is set up in the same way.

net = linearlayer(0,0.01); net = configure(net,P,T); net.IW{1,1} = [0 0]; net.b{1} = 0;

Now you are ready to train the network. Train it for only one
epoch, because you used only one pass of `adapt`

.
The default training function for the linear network is `trainb`

, and the default learning function
for the weights and biases is `learnwh`

,
so you should get the same results obtained using `adapt`

in the previous example, where the
default adaption function was `trains`

.

net.trainParam.epochs = 1; net = train(net,P,T);

If you display the weights after one epoch of training, you find

net.IW{1,1} ans = 0.4900 0.4100 net.b{1} ans = 0.2300

This is the same result as the batch mode training in `adapt`

. With static networks, the `adapt`

function can implement incremental
or batch training, depending on the format of the input data. If the
data is presented as a matrix of concurrent vectors, batch training
occurs. If the data is presented as a sequence, incremental training
occurs. This is not true for `train`

,
which always performs batch training, regardless of the format of
the input.

#### Batch Training with Dynamic Networks

Training static networks is relatively straightforward. If you
use `train`

the network is trained
in batch mode and the inputs are converted to concurrent vectors (columns
of a matrix), even if they are originally passed as a sequence (elements
of a cell array). If you use `adapt`

,
the format of the input determines the method of training. If the
inputs are passed as a sequence, then the network is trained in incremental
mode. If the inputs are passed as concurrent vectors, then batch mode
training is used.

With dynamic networks, batch mode training is typically done
with `train`

only, especially if
only one training sequence exists. To illustrate this, consider again
the linear network with a delay. Use a learning rate of 0.02 for the
training. (When using a gradient descent algorithm, you typically
use a smaller learning rate for batch mode training than incremental
training, because all the individual gradients are summed before determining
the step change to the weights.)

net = linearlayer([0 1],0.02); net.inputs{1}.size = 1; net.layers{1}.dimensions = 1; net.IW{1,1} = [0 0]; net.biasConnect = 0; net.trainParam.epochs = 1; Pi = {1}; P = {2 3 4}; T = {3 5 6};

You want to train the network with the same sequence used for the incremental training earlier, but this time you want to update the weights only after all the inputs are applied (batch mode). The network is simulated in sequential mode, because the input is a sequence, but the weights are updated in batch mode.

net = train(net,P,T,Pi);

The weights after one epoch of training are

net.IW{1,1} ans = 0.9000 0.6200

These are different weights than you would obtain using incremental training, where the weights would be updated three times during one pass through the training set. For batch training the weights are only updated once in each epoch.

### Training Feedback

The `showWindow`

parameter allows you to specify
whether a training window is visible when you train. The training
window appears by default. Two other parameters, `showCommandLine`

and `show`

,
determine whether command-line output is generated and the number
of epochs between command-line feedback during training. For instance,
this code turns off the training window and gives you training status
information every 35 epochs when the network is later trained with `train`

:

net.trainParam.showWindow = false; net.trainParam.showCommandLine = true; net.trainParam.show= 35;

Sometimes it is convenient to disable all training displays. To do that, turn off both the training window and command-line feedback:

net.trainParam.showWindow = false; net.trainParam.showCommandLine = false;

The training window appears automatically when you train.