Main Content

Set Up Parameters and Train Convolutional Neural Network

After you define the layers of your neural network as described in Specify Layers of Convolutional Neural Network, the next step is to set up the training options for the network. Use the trainingOptions function to define the global training parameters. To train a network, use the object returned by trainingOptions as an input argument to the trainNetwork function. For example:

options = trainingOptions('adam');
trainedNet = trainNetwork(data,layers,options);

Layers with learnable parameters also have options for adjusting the learning parameters. For more information, see Set Up Parameters in Convolutional and Fully Connected Layers.

Specify Solver and Maximum Number of Epochs

trainNetwork can use different variants of stochastic gradient descent to train the network. Specify the optimization algorithm by using the first input argument of the trainingOptions function. To minimize the loss, these algorithms update the network parameters by taking small steps in the direction of the negative gradient of the loss function.

The 'adam' (derived from adaptive moment estimation) solver is often a good optimizer to try first. You can also try the 'rmsprop' (root mean square propagation) and 'sgdm' (stochastic gradient descent with momentum) optimizers and see if this improves training. Different solvers work better for different tasks. For more information about the different solvers, see the trainingOptions function.

The solvers update the parameters using a subset of the data each step. This subset is called a mini-batch. You can specify the size of the mini-batch by using the 'MiniBatchSize' name-value pair argument of trainingOptions. Each parameter update is called an iteration. A full pass through the entire data set is called an epoch. You can specify the maximum number of epochs to train for by using the 'MaxEpochs' name-value pair argument of trainingOptions. The default value is 30, but you can choose a smaller number of epochs for small networks or for fine-tuning and transfer learning, where most of the learning is already done.

By default, the software shuffles the data once before training. You can change this setting by using the 'Shuffle' name-value pair argument.

Specify and Modify Learning Rate

You can specify the global learning rate by using the 'InitialLearnRate' name-value pair argument of trainingOptions. By default, trainNetwork uses this value throughout the entire training process. You can choose to modify the learning rate every certain number of epochs by multiplying the learning rate with a factor. Instead of using a small, fixed learning rate throughout the training process, you can choose a larger learning rate in the beginning of training and gradually reduce this value during optimization. Doing so can shorten the training time, while enabling smaller steps towards the minimum of the loss as training progresses.


If the mini-batch loss during training ever becomes NaN, then the learning rate is likely too high. Try reducing the learning rate, for example by a factor of 3, and restarting network training.

To gradually reduce the learning rate, use the 'LearnRateSchedule','piecewise' name-value pair argument. Once you choose this option, trainNetwork multiplies the initial learning rate by a factor of 0.1 every 10 epochs. You can specify the factor by which to reduce the initial learning rate and the number of epochs by using the 'LearnRateDropFactor' and 'LearnRateDropPeriod' name-value pair arguments, respectively.

Specify Validation Data

To perform network validation during training, specify validation data using the 'ValidationData' name-value pair argument of trainingOptions. By default, trainNetwork validates the network every 50 iterations by predicting the response of the validation data and calculating the validation loss and accuracy (root mean squared error for regression networks). You can change the validation frequency using the 'ValidationFrequency' name-value pair argument. If your network has layers that behave differently during prediction than during training (for example, dropout layers), then the validation accuracy can be higher than the training (mini-batch) accuracy. You can also use the validation data to stop training automatically when the validation loss stops decreasing. To turn on automatic validation stopping, use the 'ValidationPatience' name-value pair argument.

Performing validation at regular intervals during training helps you to determine if your network is overfitting to the training data. A common problem is that the network simply "memorizes" the training data, rather than learning general features that enable the network to make accurate predictions for new data. To check if your network is overfitting, compare the training loss and accuracy to the corresponding validation metrics. If the training loss is significantly lower than the validation loss, or the training accuracy is significantly higher than the validation accuracy, then your network is overfitting.

To reduce overfitting, you can try adding data augmentation. Use an augmentedImageDatastore to perform random transformations on your input images. This helps to prevent the network from memorizing the exact position and orientation of objects. You can also try increasing the L2 regularization using the 'L2Regularization' name-value pair argument, using batch normalization layers after convolutional layers, and adding dropout layers.

Select Hardware Resource

If a GPU is available, then trainNetwork uses it for training, by default. Otherwise, trainNetwork uses a CPU. Alternatively, you can specify the execution environment you want using the 'ExecutionEnvironment' name-value pair argument. You can specify a single CPU ('cpu'), a single GPU ('gpu'), multiple GPUs ('multi-gpu'), or a local parallel pool or compute cluster ('parallel'). All options other than 'cpu' require Parallel Computing Toolbox™. Training on a GPU requires a supported GPU device. For information on supported devices, see GPU Computing Requirements (Parallel Computing Toolbox).

Save Checkpoint Networks and Resume Training

Deep Learning Toolbox™ enables you to save neural networks as .mat files during training. This periodic saving is especially useful when you have a large neural network or a large data set, and training takes a long time. If the training is interrupted for some reason, you can resume training from the last saved checkpoint neural network. If you want the trainnet and trainNetwork functions to save checkpoint neural networks, then you must specify the name of the path by using the CheckpointPath option of trainingOptions. If the path that you specify does not exist, then trainingOptions returns an error.

The software automatically assigns unique names to checkpoint neural network files. In the example name, net_checkpoint__351__2018_04_12__18_09_52.mat, 351 is the iteration number, 2018_04_12 is the date, and 18_09_52 is the time at which the software saves the neural network. You can load a checkpoint neural network file by double-clicking it or using the load command at the command line. For example:

load net_checkpoint__351__2018_04_12__18_09_52.mat
You can then resume training by using the layers of the neural network as an input argument to trainnet or trainNetwork. For example:

You must manually specify the training options and the input data, because the checkpoint neural network does not contain this information. For an example, see Resume Training from Checkpoint Network.

Set Up Parameters in Convolutional and Fully Connected Layers

You can set the learning parameters to be different from the global values specified by trainingOptions in layers with learnable parameters, such as convolutional and fully connected layers. For example, to adjust the learning rate for the biases or weights, you can specify a value for the BiasLearnRateFactor or WeightLearnRateFactor properties of the layer, respectively. The trainNetwork function multiplies the learning rate that you specify by using trainingOptions with these factors. Similarly, you can also specify the L2 regularization factors for the weights and biases in these layers by specifying the BiasL2Factor and WeightL2Factor properties, respectively. trainNetwork then multiplies the L2 regularization factors that you specify by using trainingOptions with these factors.

Initialize Weights in Convolutional and Fully Connected Layers

The layer weights are learnable parameters. You can specify the initial value of the weights directly using the Weights property of the layer. When you train a network, if the Weights property of the layer is nonempty, then the trainnet and trainNetwork functions use the Weights property as the initial value. If the Weights property is empty, then the software uses the initializer specified by the WeightsInitializer property of the layer.

Train Your Network

After you specify the layers of your network and the training parameters, you can train the network using the training data. The data, layers, and training options are all input arguments of the trainNetwork function, as in this example.

layers = [imageInputLayer([28 28 1])
options = trainingOptions('adam');
convnet = trainNetwork(data,layers,options);

Training data can be an array, a table, or an ImageDatastore object. For more information, see the trainNetwork function reference page.

See Also

| | |

Related Topics