This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

Set Up Parameters and Train Convolutional Neural Network

After you define the layers of your network as described in Specify Layers of Convolutional Neural Network, the next step is to set up the training options for the network. Use the trainingOptions function to set up the global training parameters. trainNetwork then uses these options to perform the training. trainingOptions returns these options as a TrainingOptionsSGDM object and you must provide it as an input argument to trainNetwork. For example,

opts = trainingOptions('sgdm');
convnet = trainNetwork(data,layers,opts);

The learning layers convolution2dLayer and fullyConnectedLayer also have a few options to adjust the learning parameters for those layers. See below for more details.

Specify Solver and Maximum Number of Epochs

trainNetwork uses the Stochastic Gradient Descent with Momentum (SGDM) as the optimization algorithm. You must specify 'sgdm' as the SolverName input argument of trainingOptions. SGDM updates the weights and biases (parameters) by taking small steps in the direction of the negative gradient of the loss function, in such a way to minimize the loss. It updates the parameters using a subset of the data every time. This subset is called a mini-batch. You can specify the size of the mini-batch using the 'MiniBatchSize' name-value pair argument of trainingOptions.

Each evaluation of the gradient using a mini-batch is called an iteration. A full pass through the whole data set is called an epoch. The maximum number of epochs to run the training for is also a parameter you can specify. Use the 'MaxEpochs' name-value pair argument of trainingOptions. The default value is 30, but you might choose a smaller number of epochs for small networks, or for fine-tuning and transfer learning, where most of the learning has been done previously.

The momentum parameter is a way to prevent the oscillation of the stochastic gradient descent algorithm along the path of steepest descent. The default value for this parameter is 0.9, but you can change this value using the 'Momentum' name-value pair argument.

By default, the software shuffles the data once before training. You can change this setting using the 'Shuffle' name-value pair argument.

Specify and Modify Learning Rate

You can specify the global learning rate using the 'InitialLearnRate' name-value pair argument of trainingOptions. The default rate is 0.01, but you might want to choose a smaller value if network training is not converging. By default, trainNetwork uses this value throughout the whole training process unless you choose to change this value every certain number of epochs by multiplying with a factor. Instead of using a small fixed learning rate through the entire training, choosing a larger learning rate at the beginning of training and gradually reducing this value during optimization can help shorten the training time, while enabling smaller steps towards to optimum value as the training progresses, hence a finer search towards the end of the training.


If the mini-batch loss during training ever becomes NaN, then the learning rate is likely too high. Try reducing the learning rate, for example by a factor of 3, and restarting network training.

If you would like to gradually reduce the learning rate, use the 'LearnRateSchedule','piecewise' name-value pair argument. Once you choose this option, trainNetwork by default multiplies the initial learning rate by a factor of 0.1 every 10 epochs. However, you have the option of specifying the factor by which to reduce the initial learning rate and the number of epochs using the 'LearnRateDropFactor' and 'LearnRateDropPeriod' name-value pair arguments, respectively.

Specify Validation Data

To perform network validation during training, specify validation data using the 'ValidationData' name-value pair argument of trainingOptions. By default, trainNetwork validates the network every 50 iterations by predicting the response of the validation data and calculating the validation loss and accuracy (root mean square error for regression networks). You can change the validation frequency using the 'ValidationFrequency' name-value pair argument. If your network has layers that behave differently during prediction than during training (for example, dropout layers), then the validation accuracy can be higher than the training (mini-batch) accuracy.

Network training automatically stops when the validation loss stops improving. By default, if the validation loss is larger than or equal to the previously smallest loss five times in a row, then network training stops. To change the number of times that the validation loss is allowed to not decrease before training stops, use the 'ValidationPatience' name-value pair argument.

Performing validation at regular intervals during training helps you to determine if your network is overfitting the training data. A common problem is that the network simply "memorizes" the training data, rather than learning general features that enable it to make accurate predictions for new data. To check if your network is overfitting, compare the training loss and accuracy to the corresponding validation metrics. If the training loss is significantly lower than the validation loss, or the training accuracy is significantly higher than the validation accuracy, then your network is overfitting.

To reduce overfitting, you can try adding data augmentation. Use an augmentedImageSource to perform random transformations on your input images. This helps to prevent the network from memorizing the exact position and orientation of objects. You can also try decreasing the network size by reducing the number of layers or convolutional filters, increasing the L2 regularization using the 'L2Regularization' name-value pair argument, or adding dropout layers.

Select Hardware Resource

If there is an available GPU, trainNetwork by default uses a GPU for training. If there is no available GPU, it uses a CPU. Alternatively, you can specify the execution environment you want using the 'ExecutionEnvironment' name-value pair argument. You can choose to use a single CPU ('cpu'), multiple CPU cores ('parallel'), a single GPU ('gpu'), or multiple-GPUs ('multiple-gpu'). All options other than 'cpu' require Parallel Computing Toolbox™. Training on a GPU requires a CUDA-enabled GPU with compute capability 3.0 or higher.

Save Checkpoint Networks and Resume Training

Neural Network Toolbox™ lets you periodically save networks as .mat files after each epoch during training. This is especially useful when you have a large network or a large data set and training takes a long time. If the training gets interrupted for some reason, you can resume training from the last saved checkpoint network. If you want trainNetwork to save checkpoint networks, then you must specify the name of the path using the 'CheckpointPath' name-value pair argument of trainingOptions. If the path you specify is wrong, then trainingOptions returns an error.

trainNetwork automatically assigns unique names to these checkpoint network files. For example, convnet_checkpoint__351__2016_11_09__12_04_23.mat, where 351 is the iteration number, 2016_11_09 is the date and 12_04_21 is the time trainNetwork saves the network. You can load any of these networks by double clicking on them or typing, for example,

load convnet_checkpoint__351__2016_11_09__12_04_23.mat
in the command line. You can then resume training by using the layers of this network as an input argument to trainNetwork, for example,

You must manually specify the training options and the input data as the checkpoint network does not contain this information. For an example, see Resume Training from a Checkpoint Network.

For a full list of optional name-value pair arguments, see the trainingOptions function reference page.

Set Up Parameters in Convolutional and Fully Connected Layers

You have the option to set the learning parameters different than the global values specified by trainingOptions in a certain convolutional and/or fully connected layer. You can specify a value for the 'BiasLearnRateFactor' or the 'WeightLearnRateFactor' name-value pair in the call to convolution2dLayer or fullyConnectedLayer functions to adjust the learning rate for the biases and weights. The trainNetwork function multiplies the learning rate you specify in the trainingOptions function with these factors. Similarly, you can also specify the L2 regularizer parameter for the weights and biases in these layers using the BiasL2Regularizer and 'WeightL2Regularizer' name-value pair arguments. trainNetwork function then multiplies the L2 regularizer values you specify in the trainingOptions function with these factors.

Initialize Weights in Convolutional and Fully Connected Layers

By default, the initial values of the weights of the convolutional and fully connected layers are randomly generated from a Gaussian distribution with mean 0 and standard deviation 0.01. The initial bias is by default equal to 0. You can manually change the initialization for the weights and bias after you specify these layers. For examples, see Specify Initial Weights and Biases in Convolutional Layer and Specify Initial Weights and Biases in Fully Connected Layer.

Train Your Network

After you specify the layers of your network and specify the training parameters, you can train your network using the training data. The data, layers, and the training options are all positional input arguments for the trainNetwork function. For example,

layers = [imageInputLayer([28 28 1])
options = trainingOptions('sgdm');
convnet = trainNetwork(data,layers,options);

Training data can be in a matrix, table, or ImageDatastore format. For more information, see the trainNetwork function reference page.

See Also

| | |

Related Topics

Was this topic helpful?