Note: This page has been translated by MathWorks. Please click here

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

**MathWorks Machine Translation**

The automated translation of this page is provided by a general purpose third party translator tool.

MathWorks does not warrant, and disclaims all liability for, the accuracy, suitability, or fitness for purpose of the translation.

Options for training neural network

`options = trainingOptions(solverName)`

`options = trainingOptions(solverName,Name,Value)`

returns
a set of training options for the solver specified by `options`

= trainingOptions(`solverName`

)`solverName`

.

returns
a set of training options with additional options specified by one
or more `options`

= trainingOptions(`solverName`

,`Name,Value`

)`Name,Value`

pair arguments.

Create a set of options for training a network using stochastic gradient descent with momentum. Reduce the learning rate by a factor of 0.2 every 5 epochs. Set the maximum number of epochs for training at 20, and use a mini-batch with 300 observations at each iteration. Specify a path for saving checkpoint networks after every epoch.

options = trainingOptions('sgdm',... 'LearnRateSchedule','piecewise',... 'LearnRateDropFactor',0.2,... 'LearnRateDropPeriod',5,... 'MaxEpochs',20,... 'MiniBatchSize',300,... 'CheckpointPath','C:\TEMP\checkpoint');

Plot the training accuracy at each iteration of the training process.

First, load the sample data.

[XTrain,YTrain] = digitTrain4DArrayData;

Construct a simple network to classify the digit image data.

layers = [ ... imageInputLayer([28 28 1],'Normalization','none') convolution2dLayer(6,20) reluLayer maxPooling2dLayer(2,'Stride',2) fullyConnectedLayer(10) softmaxLayer classificationLayer];

Save the function `plotTrainingAccuracy`

on the MATLAB® path that plots training accuracy against the current iteration. `plotTrainingAccuracy`

is defined at the end of this example.

Specify the training options. Set `'OutputFcn'`

to be the `plotTrainingAccuracy`

function. For quick training, set `'MaxEpochs'`

to 5 and `'InitialLearnRate'`

to 0.1. Train the network using `trainNetwork`

.

options = trainingOptions('sgdm','Verbose',false, ... 'MaxEpochs',5, ... 'InitialLearnRate',0.1, ... 'OutputFcn',@plotTrainingAccuracy); net = trainNetwork(XTrain,YTrain,layers,options);

Use the custom function `plotTrainingAccuracy`

to plot `info.TrainingAccuracy`

against `info.Iteration`

at each function call.

function plotTrainingAccuracy(info) persistent plotObj if info.State == "start" plotObj = animatedline; xlabel("Iteration") ylabel("Training Accuracy") elseif info.State == "iteration" addpoints(plotObj,info.Iteration,info.TrainingAccuracy) drawnow limitrate nocallbacks end end

Plot the training accuracy at each iteration, and if the mean accuracy of the previous 50 iterations reaches 95%, then stop training early.

Load sample data.

[XTrain,YTrain] = digitTrain4DArrayData;

Construct a simple network to classify the digit image data.

layers = [ ... imageInputLayer([28 28 1],'Normalization','none') convolution2dLayer(6,20) reluLayer maxPooling2dLayer(2,'Stride',2) fullyConnectedLayer(10) softmaxLayer classificationLayer];

Save the custom output functions `plotTrainingAccuracy`

and `stopTrainingAtThreshold`

on the MATLAB® path. `plotTrainingAccuracy`

plots training progress, and if the mean accuracy of the previous 50 iterations reaches 95%, then `stopTrainingAtThreshold`

stops training early. These functions are defined at the end of this example.

Specify custom output functions as a cell array of function handles. Set the output functions to be `plotTrainingAccuracy`

, and `stopTrainingAtThreshold`

with a 95% threshold.

functions = { ... @plotTrainingAccuracy, ... @(info) stopTrainingAtThreshold(info,95)};

Specify the training options. Set `'OutputFcn'`

to be the cell array of function handles `functions`

. Train the network using `trainNetwork`

.

options = trainingOptions('sgdm','Verbose',false, ... 'InitialLearnRate',0.1, ... 'OutputFcn',functions); net = trainNetwork(XTrain,YTrain,layers,options);

Update the plot at each iteration using `plotTrainingAccuracy`

and `stopTrainingAtThreshold`

. Use the custom function `plotTrainingAccuracy`

to plot `info.TrainingAccuracy`

against `info.Iteration`

. Use `stopTrainingAtThreshold(info,thr)`

to stop training if the mean accuracy of the previous 50 iterations is greater than `thr`

.

function plotTrainingAccuracy(info) persistent plotObj if info.State == "start" plotObj = animatedline; xlabel("Iteration") ylabel("Training Accuracy") elseif info.State == "iteration" addpoints(plotObj,info.Iteration,info.TrainingAccuracy) drawnow limitrate nocallbacks end end function stop = stopTrainingAtThreshold(info,thr) stop = false; if info.State ~= "iteration" return end persistent iterationAccuracy % Append accuracy for this iteration iterationAccuracy = [iterationAccuracy info.TrainingAccuracy]; % Evaluate mean of iteration accuracy and remove oldest entry if numel(iterationAccuracy) == 50 stop = mean(iterationAccuracy) > thr; iterationAccuracy(1) = []; end end

`solverName`

— Solver to use for training the network`'sgdm'`

Solver to use for training the network. You must specify `'sgdm'`

(stochastic
gradient descent with momentum).

Specify optional comma-separated pairs of `Name,Value`

arguments.
`Name`

is the argument
name and `Value`

is the corresponding
value. `Name`

must appear
inside single quotes (`' '`

).
You can specify several name and value pair
arguments in any order as `Name1,Value1,...,NameN,ValueN`

.

`'InitialLearningRate',0.03,'L2Regularization',0.0005,'LearnRateSchedule','piecewise'`

specifies
the initial learning rate as 0.03, and the L2 regularization factor
as 0.0005, and instructs the software to drop the learning rate every
given number of epochs by multiplying with a set factor.`'CheckpointPath'`

— Path for saving checkpoint networks`''`

(default) | character vectorPath for saving the checkpoint networks, specified as the comma-separated
pair consisting of `'CheckpointPath'`

and a character
vector.

If you do not specify a path (i.e.,

`''`

), then the software does not save any checkpoint networks.If you specify a path, then

`trainNetwork`

saves checkpoint networks to this path after every epoch. It automatically and uniquely names each network. You can then load any of these networks and resume training from that network.If the directory is not already created, you must first create it before specifying the path to save the checkpoint networks. If the path you specify is wrong, then

`trainingOptions`

returns an error.

**Example: **`'CheckpointPath','C:\Temp\checkpoint'`

**Data Types: **`char`

`'ExecutionEnvironment'`

— Hardware resource for `trainNetwork`

`'auto'`

(default) | `'cpu'`

| `'gpu'`

| `'multi-gpu'`

| `'parallel'`

Hardware resource for `trainNetwork`

to train
the network, specified as the comma-separated pair consisting of `'ExecutionEnvironment'`

and
one of the following:

`'auto'`

— Use a GPU if it is available, otherwise uses the CPU.`'cpu'`

— Use the CPU.`'gpu'`

— Use the GPU.`'multi-gpu'`

— Use multiple GPUs on one machine, using a local parallel pool. If no pool is already open,`trainNetwork`

opens one with one worker per supported GPU device.`'parallel'`

— Use a local parallel pool or compute cluster. If no pool is already open,`trainNetwork`

opens one using the default cluster profile. If the pool has access to GPUs, then`trainNetwork`

uses them and excess workers are left idle. If the pool does not have GPUs, then the training takes place on all cluster CPUs.

`'gpu'`

, `'multi-gpu'`

, and `'parallel'`

options
require Parallel Computing Toolbox™. Additionally, to use a GPU,
you must have a CUDA^{®}-enabled NVIDIA^{®} GPU with compute capability
3.0 or higher. If one of these options are chosen and Parallel Computing Toolbox or
a suitable GPU is not available, `trainNetwork`

returns
an error.

To see an improvement in performance when training in parallel,
you might need to increase `MiniBatchSize`

to offset
the communication overhead.

**Example: **`'ExecutionEnvironment','cpu'`

**Data Types: **`char`

`'InitialLearnRate'`

— Initial learning rate0.01 (default) | a positive scalar value

Initial learning rate used for training, specified as the comma-separated
pair consisting of `'InitialLearnRate'`

and a positive
scalar value. If the learning rate is too low, the training takes
a long time, but if it is too high the training might reach a suboptimal
result.

**Example: **`'InitialLearnRate',0.03`

**Data Types: **`single`

| `double`

`'LearnRateSchedule'`

— Option for dropping learning rate during training`'none'`

(default) | `'piecewise'`

Option for dropping the learning rate during training, specified
as the comma-separated pair consisting of `'LearnRateSchedule'`

and
one of the following:

`'none'`

— The learning rate remains constant through training.`'piecewise'`

— The software updates the learning rate every certain number of epochs by multiplying with a factor. Use the`LearnRateDropFactor`

name-value pair argument to specify the value of this factor. Use the`LearnRateDropPeriod`

name-value pair argument to specify the number of epochs between multiplications.

**Example: **`'LearnRateSchedule','piecewise'`

`'LearnRateDropFactor'`

— Factor for dropping the learning rate0.1 (default) | a scalar value from 0 to 1

Factor for dropping the learning rate, specified as the comma-separated
pair consisting of `'LearnRateDropFactor'`

and a
scalar value. This option is valid only when the value of `LearnRateSchedule`

is `'piecewise'`

.

`LearnRateDropFactor`

is a multiplicative
factor to apply to the learning rate every time a certain number of
epochs has passed. You can specify the number of epochs using the `LearnRateDropPeriod`

name-value
pair argument.

**Example: **`'LearnRateDropFactor',0.02`

**Data Types: **`single`

| `double`

`'LearnRateDropPeriod'`

— Number of epochs for dropping learning rate10 (default) | integer value

Number of epochs for dropping the learning rate, specified as
the comma-separated pair consisting of `'LearnRateDropPeriod'`

and
an integer value. This option is valid only when the value of `LearnRateSchedule`

is `'piecewise'`

.

The software multiplies the global learning rate with the drop
factor every time this number of epochs passes. The drop factor is
specified by the `LearnRateDropFactor`

name-value
pair argument.

**Example: **`'LearnRateDropPeriod',3`

**Data Types: **`single`

| `double`

`'L2Regularization'`

— Factor for L2 regularizer0.0001 (default) | positive scalar value

Factor for L_{2} regularizer (weight
decay), specified as the comma-separated pair consisting of `'L2Regularization'`

and
a positive scalar value.

You can specify a multiplier for this L_{2} regularizer
when creating the convolutional layer and fully connected layer.

**Example: **`'L2Regularization',0.0005`

**Data Types: **`single`

| `double`

`'MaxEpochs'`

— Maximum number of epochs30 (default) | an integer value

Maximum number of epochs to use for training, specified as the
comma-separated pair consisting of `'MaxEpochs'`

and
an integer value.

An iteration is one step taken in the gradient descent algorithm towards minimizing the loss function using a mini batch. An epoch is the full pass of the training algorithm over the entire training set.

**Example: **`'MaxEpochs',20`

**Data Types: **`single`

| `double`

`'MiniBatchSize'`

— Size of mini-batch128 (default) | an integer value

Size of the mini-batch to use for each training iteration, specified
as the comma-separated pair consisting of `'MiniBatchSize'`

and
an integer value. A mini-batch is a subset of the training set that
is used to evaluate the gradient of the loss function and update the
weights. See Stochastic Gradient Descent with Momentum.

**Example: **`'MiniBatchSize',256`

**Data Types: **`single`

| `double`

`'Momentum'`

— Contribution of the previous gradient step0.9 (default) | a scalar value from 0 to 1

Contribution of the gradient step from the previous iteration
to the current iteration of the training, specified as the comma-separated
pair consisting of `'Momentum'`

and a scalar value
from 0 to 1. A value of 0 means no contribution from the previous
step, whereas a value of 1 means maximal contribution from the previous
step.

**Example: **`'Momentum',0.8`

**Data Types: **`single`

| `double`

`'Shuffle'`

— Indicator for data shuffle`'once'`

(default) | `'never'`

Indicator for data shuffle, specified as the comma-separated
pair consisting of `'Shuffle'`

and one of the following:

`'once'`

— The software shuffles the data once before training`'never'`

— The software does not shuffle the data

**Example: **`'Shuffle','never'`

`'Verbose'`

— Indicator to display the information on the training progress`1`

(default) | `0`

Indicator to display the information about the training progress
in the command window, specified as the comma-separated pair consisting
of `'Verbose'`

and either `1`

(`true`

)
or `0`

(`false`

).

The displayed information includes the number of epochs, number of iterations, time elapsed, mini-batch accuracy, and base learning rate. When training a regression network, RMSE is shown instead of accuracy.

**Example: **`'Verbose',0`

**Data Types: **`logical`

`'VerboseFrequency'`

— Frequency of verbose printing50 (default) | an integer value

Number of iterations between printing to the command window.
Only has an effect if `'Verbose'`

is set to `true`

.

**Data Types: **`single`

| `double`

`'WorkerLoad'`

— Relative division of load between workersevenly divided (default) | numeric vector

Relative division of load between workers of GPUs or CPUs for
the `'ExecutionEnvironment','multi-gpu'`

or `'ExecutionEnvironment','parallel'`

options,
specified as a numeric vector. This vector must contain one value
per worker in the parallel pool. For a vector $$w$$,
each worker gets $${w}_{i}/{\displaystyle \sum _{i}{w}_{i}}$$ of
the work. Use this option to balance the workload between unevenly
performing hardware.

**Data Types: **`double`

`'OutputFcn'`

— Custom output functionsfunction handle | cell array of function handles

Custom output functions to call during training, specified as
a function handle or cell array of function handles. After each iteration, `trainNetwork`

calls
the specified functions and passes a struct containing information
from the current iteration via the following fields.

Field | Description |
---|---|

`Epoch` | Current epoch number |

`Iteration` | Current iteration number |

`TimeSinceStart` | Time in seconds since the start of training |

`TrainingLoss` | Current mini-batch loss |

`BaseLearnRate` | Current base learning rate |

`TrainingAccuracy` | Accuracy of current mini batch (for classification networks) |

`TrainingRMSE` (Regression network) | RMSE of the current mini-batch (for regression networks) |

`State` | Current training state. (Possible values are `"start"` , `"iteration"` ,
or `"done"` .) |

You can use custom output functions
to display or plot progress information, or to stop training early.
For an example showing how to plot training accuracy during training,
see Plot Training Accuracy During Network Training. To stop training
early, the function must return `true`

. For an example
showing how to stop training early, see Plot Progress and Stop Training at Specified Accuracy.

**Data Types: **`function_handle`

| `cell`

`options`

— Training optionsobject

Training options returned as an object.

For the `sgdm`

training solver, `options`

is
a `TrainingOptionsSGDM`

object.

The default for the initial weights is a Gaussian distribution with a mean of 0 and a standard deviation of 0.01. The default for the initial bias value is 0. You can manually change the initialization for the weights and biases. See Specify Initial Weight and Biases in Convolutional Layer and Specify Initial Weight and Biases in Fully Connected Layer.

The gradient descent algorithm updates the parameters (weights and biases) so as to minimize the error function by taking small steps in the direction of the negative gradient of the loss function [1]:

$${\theta}_{\ell +1}={\theta}_{\ell}-\alpha \nabla E\left({\theta}_{\ell}\right),$$

where $$\ell $$ stands for the iteration number, $$\alpha >0$$ is the learning rate, $$\theta $$ is the parameter vector, and $$E\left(\theta \right)$$ is the loss function. The gradient of the loss function, $$\nabla E\left(\theta \right)$$, is evaluated using the entire training set, and the standard gradient descent algorithm uses the entire data set at once. The stochastic gradient descent algorithm evaluates the gradient, hence updates the parameters, using a subset of the training set. This subset is called a mini-batch.

Each evaluation of the gradient using the mini-batch is an iteration.
At each iteration, the algorithm takes one step towards minimizing
the loss function. The full pass of the training algorithm over the
entire training set using mini-batches is an epoch. You can specify
the mini-batch size and the maximum number of epochs using the `MiniBatchSize`

and `MaxEpochs`

name-value
pair arguments, respectively.

The gradient descent algorithm might oscillate along the steepest descent path to the optimum. Adding a momentum term to the parameter update is one way to prevent this oscillation [2]. The SGD update with momentum is

$${\theta}_{\ell +1}={\theta}_{\ell}-\alpha \nabla E\left({\theta}_{\ell}\right)+\gamma \left({\theta}_{\ell}-{\theta}_{\ell -1}\right),$$

where $$\gamma $$ determines
the contribution of the previous gradient step to the current iteration.
You can specify this value using the `Momentum`

name-value
pair argument.

By default, the software shuffles the data once before training.
You change this setting using the `Shuffle`

name-value
pair argument.

Adding a regularization term for the weights to the loss function $$E\left(\theta \right)$$ is
one way to reduce overfitting, hence the complexity of a neural network [1], [2]. The regularization
term is also called *weight decay*. The loss function
with the regularization term takes the form

$${E}_{R}\left(\theta \right)=E\left(\theta \right)+\lambda \Omega \left(w\right),$$

where $$w$$ is the weight vector, $$\lambda $$ is the regularization factor (coefficient), and the regularization function, $$\Omega \left(w\right)$$ is:

$$\Omega \left(w\right)=\frac{1}{2}{w}^{T}w.$$

Note that the biases
are not regularized [2]. You can specify
the regularization factor, $$\lambda $$,
using the `L2Regularization`

name-value pair argument.

`trainNetwork`

enables you
to save checkpoint networks as .mat files during training. You can
then resume training from any of these checkpoint networks. If you
want `trainNetwork`

to save checkpoint networks,
then you must specify the name of the path using the `CheckpointPath`

name-value
pair argument in the call to `trainingOptions`

.
If the path you specify is wrong, then `trainingOptions`

returns
an error.

`trainNetwork`

automatically assigns unique
names to these checkpoint network files. For example, `convnet_checkpoint__351__2016_11_09__12_04_23.mat`

,
where 351 is the iteration number, 2016_11_09 is the date and 12_04_21
is the time `trainNetwork`

saves the network. You
can load any of these by double clicking on them or typing, for example,

load convnet_checkpoint__351__2016_11_09__12_04_23.mat

`trainNetwork`

,
for example,trainNetwork(Xtrain,Ytrain,net.Layers,options)

[1] Bishop, C. M. *Pattern Recognition
and Machine Learning*. Springer, New York, NY, 2006.

[2] Murphy, K. P. *Machine Learning:
A Probabilistic Perspective*. The MIT Press, Cambridge,
Massachusetts, 2012.

`convolution2dLayer`

| `fullyConnectedLayer`

| `TrainingOptionsSGDM`

| `trainNetwork`

- Create Simple Deep Learning Network for Classification
- Transfer Learning and Fine-Tuning of Convolutional Neural Networks
- Resume Training from a Checkpoint Network
- Deep Learning with Big Data on GPUs and in Parallel
- Introduction to Convolutional Neural Networks
- Specify Layers of Convolutional Neural Network
- Set Up Parameters and Train Convolutional Neural Network

You clicked a link that corresponds to this MATLAB command:

Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.

Was this topic helpful?

You can also select a location from the following list:

- Canada (English)
- United States (English)

- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)

- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)