Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Set Up Parameters and Train Convolutional Neural Network

After you define the layers of your network as described in Specify Layers of Convolutional Neural Network, next step is to set up the training options for the network. You can use the trainingOptions function to set up the global training parameters. trainNetwork then uses these options to perform the training. trainingOptions saves these options as an instance of a TrainingOptionsSGDM object and you must provide it as an input argument to trainNetwork. For example,

opts = trainingOptions('sgdm');
convnet = trainNetwork(data,layers,opt);

The learning layers convolution2dLayer and fullyConnectedLayer also have a few options to adjust the learning parameters for those layers. See below for more details.

Set Up Training Parameters Using trainingOptions

Specify Solver and Maximum Number of Epochs

trainNetwork uses the Stochastic Gradient Descent with Momentum (SGDM) as the optimization algorithm. You must specify 'sgdm' as the SolverName in the call to trainingOptions. SGDM updates the weights and biases (parameters) by taking small steps in the direction of the negative gradient of the loss function, in such a way to minimize the loss. It updates the parameters using a subset of the data every time. This subset is called a mini-batch. You can specify the size of this subset using the 'MiniBatchSize' name-value pair argument in trainingOptions.

Each evaluation of the gradient using a mini-batch is called an iteration. And a full pass through the whole data set is called an epoch. The maximum number of epochs to run the training for is also a parameter you can specify. Use the 'MaxEpochs' name-value pair argument in the call to trainingOptions. The default value is 30, but you might choose a smaller number of epochs for small networks, or for fine-tuning and transfer learning, where most of the learning is done previously.

The momentum parameter is a way to prevent the oscillation of the stochastic gradient descent algorithm along the path of steepest descent. As a rule of thumb, the default value for this parameter is 0.9. However, you can change this value using the 'Momentum' name-value pair argument.

Specify and Modify Learning Rate

You can specify the global learning rate using the 'InitialLearnRate' name-value pair argument in trainingOptions. The default rate is 0.01, but you might want to choose a smaller value if you are performing transfer learning. By default, trainNetwork uses this value throughout the whole training process unless you choose to change this value every certain number of epochs by multiplying with a factor. Instead of using a small fixed learning rate through the entire training, choosing a larger learning rate at the beginning of training and gradually reducing this value during optimization might help shorten the training time, while enabling smaller steps towards to optimum value as the training progresses, hence a finer search towards the end of the training.

If you would like to gradually reduce the learning rate, use the 'LearnRateSchedule','piecewise' name-value pair argument. Once you choose this option, trainNetwork by default multiplies the initial learning rate by a factor of 0.1 every 10 epochs. However, you have the option of specifying the factor by which to reduce the initial learning rate and the number of epochs using the 'LearnRateDropFactor' and 'LearnRateDropPeriod' name-value pair arguments, respectively.

Select Hardware Resource

If there is an available GPU, trainNetwork by default uses a GPU for training. If there is no available GPU, it uses a CPU. Alternatively, you can specify the execution environment you want using the 'ExecutionEnvironment' name-value pair argument. You can choose to use a single CPU ('cpu'), multiple CPU cores ('parallel'), a single GPU ('gpu'), or multiple-GPUs ('multiple-gpu'). All options other than 'cpu' require Parallel Computing Toolbox™. Training on a GPU requires a CUDA-enabled GPU with compute capability 3.0 or higher.

Save Checkpoint Networks and Resume Training

Neural Network Toolbox™ lets you periodically save networks as .mat files after each epoch during training. This is especially important when you have a large network or a large data set and training takes a long time. If the training gets interrupted for some reason, you can resume training from the last saved checkpoint network. If you want trainNetwork to save checkpoint networks, then you must specify the name of the path using the CheckpointPath name-value pair argument in the call to trainingOptions. If the path you specify is wrong, then trainingOptions returns an error.

trainNetwork automatically assigns unique names to these checkpoint network files. For example, convnet_checkpoint__351__2016_11_09__12_04_23.mat, where 351 is the iteration number, 2016_11_09 is the date and 12_04_21 is the time trainNetwork saves the network. You can load any of these by double clicking on them or typing, for example,

load convnet_checkpoint__351__2016_11_09__12_04_23.mat
in the command line. You can then resume training by using the layers of this network in the call to trainNetwork, for example,

trainNetwork(Xtrain,Ytrain,net.Layers,options)
You must manually specify the training options and the input data as the checkpoint network does not contain this information. For an example, see Resume Training from a Checkpoint Network.

For a full list of optional name-value pair arguments, see the trainingOptions function reference page.

Set Up Parameters in Convolutional and Fully Connected Layers

You have the option to set the learning parameters different than the global values specified by trainingOptions in a certain convolutional and/or fully connected layer. You can specify a value for the BiasLearnRateFactor or the WeightLearnRateFactor name-value pair in the call to convolution2dLayer or fullyConnectedLayer functions to adjust the learning rate for the biases and weights. The trainNetwork function multiplies the learning rate you specify in the trainingOptions function with these factors. Similarly, you can also specify the L2 regularizer parameter for the weights and biases in these layers using the BiasL2Regularizer and WeightL2Regularizer name-value pair arguments. trainNetwork function then multiplies the L2 regularizer values you specify in the trainingOptions function with these factors.

Initialize Weights in Convolutional and Fully Connected Layers

The default for the initial weights for the convolutional and fully connected layers follow a Gaussian distribution with mean 0 and standard deviation 0.01. The default for the initial bias is 0. You can manually change the initialization for the weights and bias after you specify these layers. For examples, see Specify Initial Weight and Biases in Convolutional Layer and Specify Initial Weight and Biases in Fully Connected Layer.

Train Your Network

After you specify the layers of your ConvNet and specify the training parameters, you can train your network using the training data. The data, layers, and the training options are all positional input arguments for the trainNetwork function. For example,

layers = [imageInputLayer([28 28 1])
          convolution2dLayer(5,20)
          reluLayer()
          maxPooling2dLayer(2,'Stride',2)
          fullyConnectedLayer(10)
          softmaxLayer()
          classificationLayer()];
options = trainingOptions('sgdm');
convnet = trainNetwork(data,layers,options);

Training data can be in a matrix, table, or data store format. For more information, see the trainNetwork function reference page.

See Also

| | |

Related Topics

Was this topic helpful?