Documentation

This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English verison of the page.

Note: This page has been translated by MathWorks. Please click here
To view all translated materals including this page, select Japan from the country navigator on the bottom of this page.

TrainingOptionsSGDM class

Training options for stochastic gradient descent with momentum

Description

Class that is comprising training options such as learning rate information, L2 regularization factor, and mini-batch size for stochastic gradient descent with momentum.

Construction

options = trainingOptions(solverName) returns a set of training options for the solver specified by solverName.

options = trainingOptions(solverName,Name,Value) returns a set of training options, with additional options specified by one or more Name,Value pair arguments.

For more options on the name-value pair arguments, see trainingOptions.

Input Arguments

expand all

Solver to use for training the network. You must specify 'sgdm' (stochastic gradient descent with momentum).

Properties

expand all

Contribution of the gradient step from the previous iteration to the current iteration of the training. A value of 0 means no contribution, 1 means maximal contribution.

Data Types: double

Initial learning rate used for training, stored as a scalar value. If the learning rate is too low, the training takes a long time, but if it is too high the training might reach a suboptimal result.

Data Types: double

Settings for learning rate schedule, specified by the user, stored as a structure. LearnRateScheduleSettings always has the following field:

  • Method — Name of the method for adjusting the learning rate. Possible names are:

    • 'fixed' — the software does not alter the learning rate during training.

    • 'piecewise' — the learning rate drops periodically during training.

If Method is 'piecewise', then LearnRateScheduleSettings contains two more fields:

  • DropRateFactor — The multiplicative factor by which to drop the learning rate during training.

  • DropPeriod — The number of epochs that should pass between adjustments to the learning rate during training.

Data Types: struct

Factor for L2 regularizer, stored as a scalar value. Each set of parameters in a layer can specify a multiplier for the L2 regularizer.

Data Types: double

Maximum number of epochs to use for training, stored as an integer value.

Data Types: double

Size of the mini-batch to use for each training iteration, stored as an integer value.

Data Types: double

Indicator to display the information on the training progress on the command window, stored as either 1 (true) or 0 (false).

The displayed information includes the number of epochs, number of iterations, time elapsed, mini batch accuracy, and base learning rate.

Data Types: logical

Path where checkpoint networks are saved, stored as a character vector.

Data Types: char

Hardware to use for training the network, stored as a character vector.

Data Types: char

Relative division of load between parallel workers on different hardware, stored as a numeric vector.

Data Types: double

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects (MATLAB) in the MATLAB® documentation.

Examples

expand all

Create a set of options for training with stochastic gradient descent with momentum. The learning rate will be reduced by a factor of 0.2 every 5 epochs. The training will last for 20 epochs, and each iteration will use a mini-batch with 300 observations.

options = trainingOptions('sgdm',...
      'LearnRateSchedule','piecewise',...
      'LearnRateDropFactor',0.2,... 
      'LearnRateDropPeriod',5,... 
      'MaxEpochs',20,... 
      'MiniBatchSize',300);

Definitions

expand all

References

[1] Bishop, C. M. Pattern Recognition and Machine Learning. Springer, New York, NY, 2006.

[2] Murphy, K. P. Machine Learning: A Probabilistic Perspective. The MIT Press, Cambridge, Massachusetts, 2012.

See Also

Topics

Introduced in R2016a

Was this topic helpful?