After you define the layers of your network as described in Specify Layers of Convolutional Neural Network,
next step is to set up the training options for the network. You can
trainingOptions function to set up the
global training parameters.
uses these options to perform the training.
these options as an instance of a
and you must provide it as an input argument to
opts = trainingOptions('sgdm'); convnet = trainNetwork(data,layers,opt);
The learning layers
have a few options to adjust the learning parameters for those layers.
See below for more details.
trainNetwork uses the Stochastic Gradient Descent with Momentum (SGDM) as
the optimization algorithm. You must specify
SolverName in the call to
SGDM updates the weights and biases (parameters) by taking small steps
in the direction of the negative gradient of the loss function, in
such a way to minimize the loss. It updates the parameters using a
subset of the data every time. This subset is called a mini-batch.
You can specify the size of this subset using the
pair argument in
Each evaluation of the gradient using a mini-batch is called
an iteration. And a full pass through the whole
data set is called an epoch. The maximum number
of epochs to run the training for is also a parameter you can specify.
'MaxEpochs' name-value pair argument
in the call to
trainingOptions. The default value
is 30, but you might choose a smaller number of epochs for small networks,
or for fine-tuning and transfer learning, where most of the learning
is done previously.
The momentum parameter is a way to prevent the oscillation of
the stochastic gradient descent algorithm along the path of steepest
descent. As a rule of thumb, the default value for this parameter
is 0.9. However, you can change this value using the
You can specify the global learning rate using the
pair argument in
trainingOptions. The default
rate is 0.01, but you might want to choose a smaller value if you
are performing transfer learning. By default,
this value throughout the whole training process unless you choose
to change this value every certain number of epochs by multiplying
with a factor. Instead of using a small fixed learning rate through
the entire training, choosing a larger learning rate at the beginning
of training and gradually reducing this value during optimization
might help shorten the training time, while enabling smaller steps
towards to optimum value as the training progresses, hence a finer
search towards the end of the training.
If you would like to gradually reduce the learning rate, use
pair argument. Once you choose this option,
default multiplies the initial learning rate by a factor of 0.1 every
10 epochs. However, you have the option of specifying the factor by
which to reduce the initial learning rate and the number of epochs
pair arguments, respectively.
If there is an available GPU,
default uses a GPU for training. If there is no available GPU, it
uses a CPU. Alternatively, you can specify the execution environment
you want using the
pair argument. You can choose to use a single CPU (
multiple CPU cores (
'parallel'), a single GPU (
or multiple-GPUs (
'multiple-gpu'). All options
'cpu' require Parallel Computing Toolbox™.
Training on a GPU requires a CUDA-enabled GPU with compute capability
3.0 or higher.
Neural Network Toolbox™ lets you periodically save networks
as .mat files after each epoch during training. This is especially
important when you have a large network or a large data set and training
takes a long time. If the training gets interrupted for some reason,
you can resume training from the last saved checkpoint network. If
trainNetwork to save checkpoint networks,
then you must specify the name of the path using the
pair argument in the call to
If the path you specify is wrong, then
trainNetwork automatically assigns unique
names to these checkpoint network files. For example,
where 351 is the iteration number, 2016_11_09 is the date and 12_04_21
is the time
trainNetwork saves the network. You
can load any of these by double clicking on them or typing, for example,
trainNetwork, for example,
For a full list of optional name-value pair arguments, see the
trainingOptions function reference page.
You have the option to set the learning parameters different
than the global values specified by
a certain convolutional and/or fully connected layer. You can specify
a value for the
BiasLearnRateFactor or the
pair in the call to
to adjust the learning rate for the biases and weights. The
multiplies the learning rate you specify in the
with these factors. Similarly, you can also specify the L2 regularizer
parameter for the weights and biases in these layers using the
trainNetwork function then multiplies
the L2 regularizer values you specify in the
with these factors.
The default for the initial weights for the convolutional and fully connected layers follow a Gaussian distribution with mean 0 and standard deviation 0.01. The default for the initial bias is 0. You can manually change the initialization for the weights and bias after you specify these layers. For examples, see Specify Initial Weight and Biases in Convolutional Layer and Specify Initial Weight and Biases in Fully Connected Layer.
After you specify the layers of your ConvNet and specify the
training parameters, you can train your network using the training
data. The data, layers, and the training options are all positional
input arguments for the
layers = [imageInputLayer([28 28 1]) convolution2dLayer(5,20) reluLayer() maxPooling2dLayer(2,'Stride',2) fullyConnectedLayer(10) softmaxLayer() classificationLayer()]; options = trainingOptions('sgdm'); convnet = trainNetwork(data,layers,options);
Training data can be in a matrix, table, or data store format.
For more information, see the