This topic explains how to define custom deep learning layers for your problems. For a list of builtin layers in Deep Learning Toolbox™, see List of Deep Learning Layers.
This topic explains the architecture of deep learning layers and how to define custom layers to use for your problems.
Type  Description 

Layer  Define a custom deep learning layer and specify optional learnable parameters. For an example showing how to define a custom layer with learnable parameters, see Define Custom Deep Learning Layer with Learnable Parameters. For an example showing how to define a custom layer with multiple inputs, see Define Custom Deep Learning Layer with Multiple Inputs. 
Classification Output Layer  Define a custom classification output layer and specify a loss function. For an example showing how to define a custom classification output layer and specify a loss function, see Define Custom Classification Output Layer. 
Regression Output Layer  Define a custom regression output layer and specify a loss function. For an example showing how to define a custom regression output layer and specify a loss function, see Define Custom Regression Output Layer. 
You can use the following templates to define new layers.
During training, the software iteratively performs forward and backward passes through the network.
When making a forward pass through the network, each layer takes the outputs of the previous layers, applies a function, and then outputs (forward propagates) the results to the next layers.
Layers can have multiple inputs or outputs. For example, a layer can take X_{1}, …, X_{n} from multiple previous layers and forward propagate the outputs Z_{1}, …, Z_{m} to the next layers.
At the end of a forward pass of the network, the output layer calculates the loss L between the predictions Y and the true targets T.
During the backward pass of a network, each layer takes the derivatives of the loss with respect to the outputs of the layer, computes the derivatives of the loss L with respect to the inputs, and then backward propagates the results. If the layer has learnable parameters, then the layer also computes the derivatives of the layer weights (learnable parameters). The layer uses the derivatives of the weights to update the learnable parameters.
The following figure describes the flow of data through a deep neural network and highlights the data flow through a layer with a single input X, a single output Z, and a learnable parameter W.
Declare the layer properties in the properties
section of the
class definition.
By default, custom intermediate layers have these properties:
Property  Description 

Name 
Layer name, specified as a character vector or a string scalar.
To include a layer in a layer graph, you must specify a nonempty unique layer name. If you train
a series network with the layer and Name is set to '' ,
then the software automatically assigns a name to the layer at training time.

Description  Oneline description of the layer, specified as a character
vector or a string scalar. This description appears when the layer
is displayed in a 
Type  Type of the layer, specified as a character vector or a string
scalar. The value of Type appears when the layer is
displayed in a Layer array. If you do not specify a
layer type, then the software displays the layer class name. 
NumInputs  Number of inputs of the layer specified as a positive integer. If you
do not specify this value, then the software automatically sets
NumInputs to the number of names in
InputNames . The default value is 1. 
InputNames  The input names of the layer specified as a cell array of character
vectors. If you do not specify this value and
NumInputs is greater than 1, then the software
automatically sets InputNames to
{'in1',...,'inN'} , where N is
equal to NumInputs . The default value is
{'in'} . 
NumOutputs  Number of outputs of the layer specified as a positive integer. If
you do not specify this value, then the software automatically sets
NumOutputs to the number of names in
OutputNames . The default value is 1. 
OutputNames  The output names of the layer specified as a cell array of character
vectors. If you do not specify this value and
NumOutputs is greater than 1, then the software
automatically sets OutputNames to
{'out1',...,'outM'} , where M
is equal to NumOutputs . The default value is
{'out'} . 
If the layer has no other properties, then you can omit the properties
section.
If you are creating a layer with multiple inputs, then you must set either the NumInputs
or InputNames
in the layer constructor. If you are creating a layer with multiple outputs, then you must set either the NumOutputs
or OutputNames
in the layer constructor. For an example, see Define Custom Deep Learning Layer with Multiple Inputs.
Declare the layer learnable parameters in the properties
(Learnable)
section of the class definition. If the layer has no
learnable parameters, then you can omit the properties
(Learnable)
section.
Optionally, you can specify the learning rate factor and the L2 factor of the
learnable parameters. By default, each learnable parameter has its learning rate
factor and L2 factor set to 1
.
For both builtin and custom layers, you can set and get the learn rate factors and L2 regularization factors using the following functions.
Function  Description 

setLearnRateFactor  Set the learn rate factor of a learnable parameter. 
setL2Factor  Set the L2 regularization factor of a learnable parameter. 
getLearnRateFactor  Get the learn rate factor of a learnable parameter. 
getL2Factor  Get the L2 regularization factor of a learnable parameter. 
To specify the learning rate factor and the L2 factor of a learnable parameter,
use the syntaxes layer =
setLearnRateFactor(layer,'MyParameterName',value)
and layer =
setL2Factor(layer,'MyParameterName',value)
, respectively.
To get the value of the learning rate factor and the L2 factor of a learnable
parameter, use the syntaxes
getLearnRateFactor(layer,'MyParameterName')
and
getL2Factor(layer,'MyParameterName')
respectively.
For example, this syntax sets the learn rate factor of the learnable parameter
with the name 'Alpha'
to 0.1
.
layer = setLearnRateFactor(layer,'Alpha',0.1);
A layer uses one of two functions to perform a forward pass:
predict
or forward
. If the forward pass is
at prediction time, then the layer uses the predict
function. If
the forward pass is at training time, then the layer uses the
forward
function. If you do not require two different
functions for prediction time and training time, then you can omit the
forward
function. In this case, the layer uses
predict
at training time.
If you define the function forward
and custom backward
function, then you must assign a value to the argument memory
,
which you can use during backward propagation.
The syntax for predict
is
[Z1,…,Zm] = predict(layer,X1,…,Xn)
X1,…,Xn
are the n
layer inputs and
Z1,…,Zm
are the m
layer outputs. The values
n
and m
must correspond to the
NumInputs
and NumOutputs
properties of the
layer.If the number of inputs to predict
can vary, then use
varargin
instead of X1,…,Xn
. In this case,
varargin
is a cell array of the inputs, where
varargin{i}
corresponds to Xi
. If the number
of outputs can vary, then use varargout
instead of
Z1,…,Zm
. In this case, varargout
is a cell
array of the outputs, where varargout{j}
corresponds to
Zj
.
The syntax for forward
is
[Z1,…,Zm,memory] = forward(layer,X1,…,Xn)
X1,…,Xn
are the n
layer inputs,
Z1,…,Zm
are the m
layer outputs, and
memory
is the memory of the layer.If the number of inputs to forward
can vary, then use
varargin
instead of X1,…,Xn
. In this case,
varargin
is a cell array of the inputs, where
varargin{i}
corresponds to Xi
. If the number
of outputs can vary, then use varargout
instead of
Z1,…,Zm
. In this case, varargout
is a cell
array of the outputs, where varargout{j}
corresponds to
Zj
for j
=1,…,NumOutputs
and
varargout{NumOutputs+1}
corresponds to
memory
.
The dimensions of the inputs depend on the type of data and the output of the connected layers:
Layer Input  Input Size  Observation Dimension 

2D images  hbywbycbyN, where h, w, and c correspond to the height, width, and number of channels of the images respectively, and N is the number of observations.  4 
3D images  hbywbyDbycbyN, where h, w, D, and c correspond to the height, width, depth, and number of channels of the 3D images respectively, and N is the number of observations.  5 
Vector sequences  cbyNbyS, where c is the number of features of the sequences, N is the number of observations, and S is the sequence length.  2 
2D image sequences  hbywbycbyNbyS, where h, w, and c correspond to the height, width, and number of channels of the images respectively, N is the number of observations, and S is the sequence length.  4 
3D image sequences  hbywbydbycbyNbyS, where h, w, d, and c correspond to the height, width, depth, and number of channels of the 3D images respectively, N is the number of observations, and S is the sequence length.  5 
The layer backward function computes the derivatives of the loss with respect to
the input data and then outputs (backward propagates) results to the previous layer.
If the layer has learnable parameters (for example, layer weights), then backward
also
computes the derivatives of the learnable parameters. When using the
trainNetwork
function, the layer automatically updates the
learnable parameters using these derivatives during the backward pass.
Defining the backward function is
optional. If you do not specify a backward function, and the layer forward functions
support dlarray
objects, then the software automatically determines
the backward function using automatic differentiation. For a list of functions that
support dlarray
objects, see List of Functions with dlarray Support. Define a custom
backward function when you want to:
Use a specific algorithm to compute the derivatives.
Use operations in the forward functions that do not support
dlarray
objects.
To define a custom backward function,
create a function named backward
.
The syntax for backward
is
[dLdX1,…,dLdXn,dLdW1,…,dLdWk] = backward(layer,X1,…,Xn,Z1,…,Zm,dLdZ1,…,dLdZm,memory)
X1,…,Xn
are the n
layer inputs,
Z1,…,Zm
are the m
outputs of
forward
, dLdZ1,…,dLdZm
are the gradients backward
propagated from the next layer, and memory
is the memory output of
forward
. For the outputs, dLdX1,…,dLdXn
are the
derivatives of the loss with respect to the layer inputs and
dLdW1,…,dLdWk
are the derivatives of the loss with respect to the
k
learnable parameters. To reduce memory usage by preventing unused
variables being saved between the forward and backward pass, replace the corresponding input
arguments with ~
.If the number of inputs to backward
can vary, then use
varargin
instead of the input arguments after
layer
. In this case, varargin
is a cell array
of the inputs, where varargin{i}
corresponds to Xi
for i
=1,…,NumInputs
,
varargin{NumInputs+j}
and
varargin{NumInputs+NumOutputs+j}
correspond to
Zj
and dLdZj
, respectively, for
j
=1,…,NumOutputs
, and
varargin{end}
corresponds to memory
.
If the number of outputs can vary, then use varargout
instead of the
output arguments. In this case, varargout
is a cell array of the
outputs, where varargout{i}
corresponds to dLdXi
for i
=1,…,NumInputs
and
varargout{NumInputs+t}
corresponds to dLdWt
for t
=1,…,k
, where k
is the
number of learnable parameters.
The values of X1,…,Xn
and Z1,…,Zm
are the
same as in the forward functions. The dimensions of dLdZ1,…,dLdZm
are the same as the dimensions of Z1,…,Zm
, respectively.
The dimensions and data type of dLdX1,…,dLdxn
are the same as
the dimensions and data type of X1,…,Xn
, respectively. The
dimensions and data types of dLdW1
,…,dLdWk
are
the same as the dimensions and data types of
W1
,…,Wk
, respectively.
To calculate the derivatives of the loss, you can use the chain rule:
$$\frac{\partial L}{\partial {X}^{(i)}}={\displaystyle \sum}_{j}^{}\frac{\partial L}{\partial {z}_{j}}\frac{\partial {z}_{j}}{\partial {X}^{(i)}}$$
$$\frac{\partial L}{\partial {W}_{i}}={\displaystyle \sum _{j}\frac{\partial L}{\partial {Z}_{j}}}\frac{\partial {Z}_{j}}{\partial {W}_{i}}$$
When using the trainNetwork
function, the layer automatically
updates the learnable parameters using the derivatives
dLdW1
,…,dLdWk
during the backward
pass.
If the layer forward functions fully support dlarray
objects, then the layer
is GPU compatible. Otherwise, to be GPU compatible, the layer functions must support inputs
and return outputs of type gpuArray
.
Many MATLAB^{®} builtin functions support gpuArray
and dlarray
input arguments. For a list of
functions that support dlarray
objects, see List of Functions with dlarray Support. For a list of functions
that execute on a GPU, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
To use a GPU for deep
learning, you must also have a CUDA^{®} enabled NVIDIA^{®} GPU with compute capability 3.0 or higher. For more information on working with GPUs in MATLAB, see GPU Computing in MATLAB (Parallel Computing Toolbox).
If you create a custom deep learning layer, then you can use
the checkLayer
function
to check that the layer is valid. The function checks layers for validity, GPU compatibility,
and correctly defined gradients. To check that a layer is valid, run the following
command:
checkLayer(layer,validInputSize,'ObservationDimension',dim)
layer
is an instance of the layer, validInputSize
is a
vector or cell array specifying the valid input sizes to the layer, and dim
specifies the dimension of the observations in the layer input data. For large input sizes, the gradient checks take longer to run. To speed up the tests, specify a smaller valid input size.For more information, see Check Custom Layer Validity.
checkLayer
Check the layer validity of the custom layer preluLayer
.
Define a custom PReLU layer. To create this layer, save the file preluLayer.m
in the current folder.
Create an instance of the layer and check its validity using checkLayer
. Specify the valid input size to be the size of a single observation of typical input to the layer. The layer expects 4D array inputs, where the first three dimensions correspond to the height, width, and number of channels of the previous layer output, and the fourth dimension corresponds to the observations.
Specify the typical size of the input of an observation and set 'ObservationDimension'
to 4.
layer = preluLayer(20,'prelu'); validInputSize = [24 24 20]; checkLayer(layer,validInputSize,'ObservationDimension',4)
Skipping GPU tests. No compatible GPU device found. Running nnet.checklayer.TestLayerWithoutBackward .......... ... Done nnet.checklayer.TestLayerWithoutBackward __________ Test Summary: 13 Passed, 0 Failed, 0 Incomplete, 4 Skipped. Time elapsed: 1.047 seconds.
Here, the function does not detect any issues with the layer.
You can use a custom layer in the same way as any other layer in Deep Learning Toolbox.
Define a custom PReLU layer. To create this layer, save the file preluLayer.m
in the current folder.
Create a layer array that includes the custom layer preluLayer
.
layers = [
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
batchNormalizationLayer
preluLayer(20,'prelu')
fullyConnectedLayer(10)
softmaxLayer
classificationLayer];
At the end of a forward pass at training time, an output layer takes the predictions (outputs) y of the previous layer and calculates the loss L between these predictions and the training targets. The output layer computes the derivatives of the loss L with respect to the predictions y and outputs (backward propagates) results to the previous layer.
The following figure describes the flow of data through a convolutional neural network and an output layer.
Declare the layer properties in the properties
section of the
class definition.
By default, custom output layers have the following properties:
Name
– Layer name, specified as a character vector or a string scalar.
To include a layer in a layer graph, you must specify a nonempty unique layer name. If you train
a series network with the layer and Name
is set to ''
,
then the software automatically assigns a name to the layer at training time.
Description
– Oneline description of the layer, specified
as a character vector or a string scalar. This description appears when the
layer is displayed in a Layer
array. If you do not specify a
layer description, then the software displays "Classification
Output"
or "Regression Output"
.
Type
– Type of the layer, specified as a character vector
or a string scalar. The value of Type
appears when the layer
is displayed in a Layer
array. If you do not specify a layer
type, then the software displays the layer class name.
Custom classification layers also have the following property:
Classes
– Classes of the output layer, specified as a categorical vector,
string array, cell array of character vectors, or 'auto'
. If
Classes
is 'auto'
, then the software automatically
sets the classes at training time. If you specify the string array or cell array of character
vectors str
, then the software sets the classes of the output layer to
categorical(str,str)
. The default value is
'auto'
.
Custom regression layers also have the following property:
ResponseNames
– Names of the responses, specified a cell array of character vectors or a string array. At training time, the software automatically sets the response names according to the training data. The default is {}
.
If the layer has no other properties, then you can omit the properties
section.
The output layer computes the loss L
between predictions and
targets using the forward loss function and computes the derivatives of the loss
with respect to the predictions using the backward loss function.
The syntax for forwardLoss
is loss
= forwardLoss(layer, Y, T)
. The input Y
corresponds to the
predictions made by the network. These predictions are the output of the previous layer. The
input T
corresponds to the training targets. The output
loss
is the loss between Y
and T
according to the specified loss function. The output loss
must be
scalar.
If the layer forward loss function
supports dlarray
objects, then the software automatically
determines the backward loss function. For a list of functions that support
dlarray
objects, see List of Functions with dlarray Support. Alternatively,
to define a custom backward loss function, create a function named
backwardLoss
.
The syntax for backwardLoss
is dLdY
= backwardLoss(layer, Y, T)
. The inputs Y
are the predictions
made by the network and T
are the training targets. The output
dLdY
is the derivative of the loss with respect to the predictions
Y
. The output dLdY
must be the same size as the layer
input Y
.
For classification problems, the dimensions of T
depend on the type of
problem.
Classification Task  Input Size  Observation Dimension 

2D image classification  1by1byKbyN, where K is the number of classes and N is the number of observations.  4 
3D image classification  1by1by1byKbyN, where K is the number of classes and N is the number of observations.  5 
Sequencetolabel classification  KbyN, where K is the number of classes and N is the number of observations.  2 
Sequencetosequence classification  KbyNbyS, where K is the number of classes, N is the number of observations, and S is the sequence length.  2 
The size of Y
depends on the output of the previous layer. To ensure that
Y
is the same size as T
, you must include a layer
that outputs the correct size before the output layer. For example, to ensure that
Y
is a 4D array of prediction scores for K
classes, you can include a fully connected layer of size K followed by a
softmax layer before the output layer.
For regression problems, the dimensions of T
also depend on the type of
problem.
Regression Task  Input Size  Observation Dimension 

2D image regression  1by1byRbyN, where R is the number of responses and N is the number of observations.  4 
2D Imagetoimage regression  hbywbycbyN ,
where h, w, and
c are the height, width, and number of channels
of the output respectively, and N is the number of
observations.  4 
3D image regression  1by1by1byRbyN, where R is the number of responses and N is the number of observations.  5 
3D Imagetoimage regression  hbywbydbycbyN ,
where h, w, d,
and c are the height, width, depth, and number of
channels of the output respectively, and N is the
number of observations.  5 
Sequencetoone regression  RbyN, where R is the number of responses and N is the number of observations.  2 
Sequencetosequence regression  RbyNbyS, where R is the number of responses, N is the number of observations, and S is the sequence length.  2 
For example, if the network defines an image regression network with one response and has
minibatches of size 50, then T
is a 4D array of size
1by1by1by50.
The size of Y
depends on the output of the previous layer. To ensure
that Y
is the same size as T
, you must include a layer
that outputs the correct size before the output layer. For example, for image regression
with R responses, to ensure that Y
is a 4D array of
the correct size, you can include a fully connected layer of size R
before the output layer.
The forwardLoss
and backwardLoss
functions
have the following output arguments.
Function  Output Argument  Description 

forwardLoss  loss  Calculated loss between the predictions Y and
the true target T . 
backwardLoss  dLdY  Derivative of the loss with respect to the predictions
Y . 
The backwardLoss
must output dLdY
with the
size expected by the previous layer and dLdY
to be the same size
as Y
.
If the layer forward functions fully support dlarray
objects, then the layer
is GPU compatible. Otherwise, to be GPU compatible, the layer functions must support inputs
and return outputs of type gpuArray
.
Many MATLAB builtin functions support gpuArray
and dlarray
input arguments. For a list of
functions that support dlarray
objects, see List of Functions with dlarray Support. For a list of functions
that execute on a GPU, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
To use a GPU for deep
learning, you must also have a CUDA enabled NVIDIA GPU with compute capability 3.0 or higher. For more information on working with GPUs in MATLAB, see GPU Computing in MATLAB (Parallel Computing Toolbox).
You can use a custom output layer in the same way as any other output layer in Deep Learning Toolbox. This section shows how to create and train a network for regression using a custom output layer.
The example constructs a convolutional neural network architecture, trains a network, and uses the trained network to predict angles of rotated, handwritten digits. These predictions are useful for optical character recognition.
Define a custom mean absolute error regression layer. To create this layer, save the file maeRegressionLayer.m
in the current folder.
Load the example training data.
[XTrain,~,YTrain] = digitTrain4DArrayData;
Create a layer array and include the custom regression output layer maeRegressionLayer
.
layers = [
imageInputLayer([28 28 1])
convolution2dLayer(5,20)
batchNormalizationLayer
reluLayer
fullyConnectedLayer(1)
maeRegressionLayer('mae')]
layers = 6x1 Layer array with layers: 1 '' Image Input 28x28x1 images with 'zerocenter' normalization 2 '' Convolution 20 5x5 convolutions with stride [1 1] and padding [0 0 0 0] 3 '' Batch Normalization Batch normalization 4 '' ReLU ReLU 5 '' Fully Connected 1 fully connected layer 6 'mae' Regression Output Mean absolute error
Set the training options and train the network.
options = trainingOptions('sgdm','Verbose',false); net = trainNetwork(XTrain,YTrain,layers,options);
Evaluate the network performance by calculating the prediction error between the predicted and actual angles of rotation.
[XTest,~,YTest] = digitTest4DArrayData; YPred = predict(net,XTest); predictionError = YTest  YPred;
Calculate the number of predictions within an acceptable error margin from the true angles. Set the threshold to 10 degrees and calculate the percentage of predictions within this threshold.
thr = 10; numCorrect = sum(abs(predictionError) < thr); numTestImages = size(XTest,4); accuracy = numCorrect/numTestImages
accuracy = 0.7524
assembleNetwork
 checkLayer
 getL2Factor
 getLearnRateFactor
 setL2Factor
 setLearnRateFactor