Define Custom Deep Learning Layer with Multiple Inputs

If Deep Learning Toolbox™ does not provide the layer you require for your classification or regression problem, then you can define your own custom layer using this example as a guide. For a list of built-in layers, see List of Deep Learning Layers.

To define a custom deep learning layer, you can use the template provided in this example, which takes you through the following steps:

  1. Name the layer – give the layer a name so that it can be used in MATLAB®.

  2. Declare the layer properties – specify the properties of the layer and which parameters are learned during training.

  3. Create a constructor function (optional) – specify how to construct the layer and initialize its properties. If you do not specify a constructor function, then at creation, the software initializes the Name, Description, and Type properties with [] and sets the number of layer inputs and outputs to 1.

  4. Create forward functions – specify how data passes forward through the layer (forward propagation) at prediction time and at training time.

  5. Create a backward function – specify the derivatives of the loss with respect to the input data and the learnable parameters (backward propagation).

This example shows how to create a weighted addition layer, which is a layer with multiple inputs and learnable parameter, and use it in a convolutional neural network. A weighted addition layer scales and adds inputs from multiple neural network layers element-wise.

Layer with Learnable Parameters Template

Copy the layer with learnable parameters template into a new file in MATLAB. This template outlines the structure of a layer with learnable parameters and includes the functions that define the layer behavior.

classdef myLayer < nnet.layer.Layer

    properties
        % (Optional) Layer properties.

        % Layer properties go here.
    end

    properties (Learnable)
        % (Optional) Layer learnable parameters.

        % Layer learnable parameters go here.
    end
    
    methods
        function layer = myLayer()
            % (Optional) Create a myLayer.
            % This function must have the same name as the class.

            % Layer constructor function goes here.
        end
        
        function [Z1, …, Zm] = predict(layer, X1, …, Xn)
            % Forward input data through the layer at prediction time and
            % output the result.
            %
            % Inputs:
            %         layer       - Layer to forward propagate through
            %         X1, ..., Xn - Input data
            % Outputs:
            %         Z1, ..., Zm - Outputs of layer forward function
            
            % Layer forward function for prediction goes here.
        end

        function [Z1, …, Zm, memory] = forward(layer, X1, …, Xn)
            % (Optional) Forward input data through the layer at training
            % time and output the result and a memory value.
            %
            % Inputs:
            %         layer       - Layer to forward propagate through
            %         X1, ..., Xn - Input data
            % Outputs:
            %         Z1, ..., Zm - Outputs of layer forward function
            %         memory      - Memory value for backward propagation

            % Layer forward function for training goes here.
        end

        function [dLdX1, …, dLdXn, dLdW1, …, dLdWk] = ...
                backward(layer, X1, …, Xn, Z1, …, Zm, dLdZ1, …, dLdZm, memory)
            % Backward propagate the derivative of the loss function through 
            % the layer.
            %
            % Inputs:
            %         layer             - Layer to backward propagate through
            %         X1, ..., Xn       - Input data
            %         Z1, ..., Zm       - Outputs of layer forward function            
            %         dLdZ1, ..., dLdZm - Gradients propagated from the next layers
            %         memory            - Memory value from forward function
            % Outputs:
            %         dLdX1, ..., dLdXn - Derivatives of the loss with respect to the
            %                             inputs
            %         dLdW1, ..., dLdWk - Derivatives of the loss with respect to each
            %                             learnable parameter
            
            % Layer backward function goes here.
        end
    end
end

Name the Layer

First, give the layer a name. In the first line of the class file, replace the existing name myLayer with weightedAdditionLayer.

classdef weightedAdditionLayer < nnet.layer.Layer
    ...
end

Next, rename the myLayer constructor function (the first function in the methods section) so that it has the same name as the layer.

    methods
        function layer = weightedAdditionLayer()           
            ...
        end

        ...
     end

Save the Layer

Save the layer class file in a new file named weightedAdditionLayer.m. The file name must match the layer name. To use the layer, you must save the file in the current folder or in a folder on the MATLAB path.

Declare Properties and Learnable Parameters

Declare the layer properties in the properties section and declare learnable parameters by listing them in the properties (Learnable) section.

By default, custom intermediate layers have these properties:

PropertyDescription
Name Layer name, specified as a character vector or a string scalar. To include a layer in a layer graph, you must specify a nonempty unique layer name. If you train a series network with the layer and Name is set to '', then the software automatically assigns a name to the layer at training time.
Description

One-line description of the layer, specified as a character vector or a string scalar. This description appears when the layer is displayed in a Layer array. If you do not specify a layer description, then the software displays the layer class name.

TypeType of the layer, specified as a character vector or a string scalar. The value of Type appears when the layer is displayed in a Layer array. If you do not specify a layer type, then the software displays the layer class name.
NumInputsNumber of inputs of the layer specified as a positive integer. If you do not specify this value, then the software automatically sets NumInputs to the number of names in InputNames. The default value is 1.
InputNamesThe input names of the layer specified as a cell array of character vectors. If you do not specify this value and NumInputs is greater than 1, then the software automatically sets InputNames to {'in1',...,'inN'}, where N is equal to NumInputs. The default value is {'in'}.
NumOutputsNumber of outputs of the layer specified as a positive integer. If you do not specify this value, then the software automatically sets NumOutputs to the number of names in OutputNames. The default value is 1.
OutputNamesThe output names of the layer specified as a cell array of character vectors. If you do not specify this value and NumOutputs is greater than 1, then the software automatically sets OutputNames to {'out1',...,'outM'}, where M is equal to NumOutputs. The default value is {'out'}.

If the layer has no other properties, then you can omit the properties section.

Tip

If you are creating a layer with multiple inputs, then you must set either the NumInputs or InputNames in the layer constructor. If you are creating a layer with multiple outputs, then you must set either the NumOutputs or OutputNames in the layer constructor.

A weighted addition layer does not require any additional properties, so you can remove the properties section.

A weighted addition layer has only one learnable parameter, the weights. Declare this learnable parameter in the properties (Learnable) section and call the parameter Weights.

    properties (Learnable)
        % Layer learnable parameters
            
        % Scaling coefficients
        Weights
    end

Create Constructor Function

Create the function that constructs the layer and initializes the layer properties. Specify any variables required to create the layer as inputs to the constructor function.

The weighted addition layer constructor function requires two inputs: the number of inputs to the layer and the layer name. This number of inputs to the layer specifies the size of the learnable parameter Weights. Specify two input arguments named numInputs and name in the weightedAdditionLayer function. Add a comment to the top of the function that explains the syntax of the function.

        function layer = weightedAdditionLayer(numInputs,name)
            % layer = weightedAdditionLayer(numInputs,name) creates a
            % weighted addition layer and specifies the number of inputs
            % and the layer name.
            
            ...
        end

Initialize Layer Properties

Initialize the layer properties, including learnable parameters, in the constructor function. Replace the comment % Layer constructor function goes here with code that initializes the layer properties.

Set the NumInputs property to the input argument numInputs.

            % Set number of inputs.
            layer.NumInputs = numInputs;

Set the Name property to the input argument name.

            % Set layer name.
            layer.Name = name;

Give the layer a one-line description by setting the Description property of the layer. Set the description to describe the type of layer and its size.

            % Set layer description.
            layer.Description = "Weighted addition of " + numInputs + ...
                " inputs";

A weighted addition layer multiplies each layer input by the corresponding coefficient in Weights and adds the resulting values together. Initialize the learnable parameter Weights to be a random vector of size 1-by-numInputs. Weights is a property of the layer object, so you must assign the vector to layer.Weights.

            % Initialize layer weights
            layer.Weights = rand(1,numInputs);

View the completed constructor function.

        function layer = weightedAdditionLayer(numInputs,name) 
            % layer = weightedAdditionLayer(numInputs,name) creates a
            % weighted addition layer and specifies the number of inputs
            % and the layer name.

            % Set number of inputs.
            layer.NumInputs = numInputs;

            % Set layer name.
            layer.Name = name;

            % Set layer description.
            layer.Description = "Weighted addition of " + numInputs +  ... 
                " inputs";
        
            % Initialize layer weights.
            layer.Weights = rand(1,numInputs); 
        end

With this constructor function, the command weightedAdditionLayer(3,'add') creates a weighted addition layer with three inputs and the name 'add'.

Create Forward Functions

Create the layer forward functions to use at prediction time and training time.

Create a function named predict that propagates the data forward through the layer at prediction time and outputs the result.

The syntax for predict is

[Z1,…,Zm] = predict(layer,X1,…,Xn)
where X1,…,Xn are the n layer inputs and Z1,…,Zm are the m layer outputs. The values n and m must correspond to the NumInputs and NumOutputs properties of the layer.

Tip

If the number of inputs to predict can vary, then use varargin instead of X1,…,Xn. In this case, varargin is a cell array of the inputs, where varargin{i} corresponds to Xi. If the number of outputs can vary, then use varargout instead of Z1,…,Zm. In this case, varargout is a cell array of the outputs, where varargout{j} corresponds to Zj.

Because a weighted addition layer has only one output and a variable number of inputs, the syntax for predict for a weighted addition layer is Z = predict(layer,varargin), where varargin{i} corresponds to Xi for positive integers i less than or equal to NumInputs.

By default, the layer uses predict as the forward function at training time. To use a different forward function at training time, or retain a value required for the backward function, you must also create a function named forward.

The dimensions of the inputs depend on the type of data and the output of the connected layers:

Layer InputInput SizeObservation Dimension
2-D imagesh-by-w-by-c-by-N, where h, w, and c correspond to the height, width, and number of channels of the images respectively, and N is the number of observations.4
3-D imagesh-by-w-by-D-by-c-by-N, where h, w, D, and c correspond to the height, width, depth, and number of channels of the 3-D images respectively, and N is the number of observations.5
Vector sequencesc-by-N-by-S, where c is the number of features of the sequences, N is the number of observations, and S is the sequence length.2
2-D image sequencesh-by-w-by-c-by-N-by-S, where h, w, and c correspond to the height, width, and number of channels of the images respectively, N is the number of observations, and S is the sequence length.4
3-D image sequencesh-by-w-by-d-by-c-by-N-by-S, where h, w, d, and c correspond to the height, width, depth, and number of channels of the 3-D images respectively, N is the number of observations, and S is the sequence length.5

The forward function propagates the data forward through the layer at training time and also outputs a memory value.

The syntax for forward is

[Z1,…,Zm,memory] = forward(layer,X1,…,Xn)
where X1,…,Xn are the n layer inputs, Z1,…,Zm are the m layer outputs, and memory is the memory of the layer.

Tip

If the number of inputs to forward can vary, then use varargin instead of X1,…,Xn. In this case, varargin is a cell array of the inputs, where varargin{i} corresponds to Xi. If the number of outputs can vary, then use varargout instead of Z1,…,Zm. In this case, varargout is a cell array of the outputs, where varargout{j} corresponds to Zj for j=1,…,NumOutputs and varargout{NumOutputs+1} corresponds to memory.

The forward function of a weighted addition layer is

f(X(1),,X(n))=i=1nWiX(i)

where X(1), …, X(n) correspond to the layer inputs and W1,…,Wn are the layer weights.

Implement the forward function in predict. In predict, the output Z corresponds to f(X(1),,X(n)). The weighted addition layer does not require memory or a different forward function for training, so you can remove the forward function from the class file. Add a comment to the top of the function that explains the syntaxes of the function.

Tip

If you preallocate arrays using functions like zeros, then you must ensure that the data types of these arrays are consistent with the layer function inputs. To create an array of zeros of the same data type of another array, use the 'like' option of zeros. For example, to initialize an array of zeros of size sz with the same data type as the array X, use Z = zeros(sz,'like',X).

        function Z = predict(layer, varargin)
            % Z = predict(layer, X1, ..., Xn) forwards the input data X1,
            % ..., Xn through the layer and outputs the result Z.
            
            X = varargin;
            W = layer.Weights;
            
            % Initialize output
            X1 = X{1};
            sz = size(X1);
            Z = zeros(sz,'like',X1);
            
            % Weighted addition
            for i = 1:layer.NumInputs
                Z = Z + W(i)*X{i};
            end
        end

Create Backward Function

Implement the derivatives of the loss with respect to the input data and the learnable parameters in the backward function.

The syntax for backward is

[dLdX1,…,dLdXn,dLdW1,…,dLdWk] = backward(layer,X1,…,Xn,Z1,…,Zm,dLdZ1,…,dLdZm,memory)
where X1,…,Xn are the n layer inputs, Z1,…,Zm are the m outputs of forward, dLdZ1,…,dLdZm are the gradients backward propagated from the next layer, and memory is the memory output of forward. For the outputs, dLdX1,…,dLdXn are the derivatives of the loss with respect to the layer inputs and dLdW1,…,dLdWk are the derivatives of the loss with respect to the k learnable parameters. To reduce memory usage by preventing unused variables being saved between the forward and backward pass, replace the corresponding input arguments with ~.

Tip

If the number of inputs to backward can vary, then use varargin instead of the input arguments after layer. In this case, varargin is a cell array of the inputs, where varargin{i} corresponds to Xi for i=1,…,NumInputs, varargin{NumInputs+j} and varargin{NumInputs+NumOutputs+j} correspond to Zj and dLdZj, respectively, for j=1,…,NumOutputs, and varargin{end} corresponds to memory.

If the number of outputs can vary, then use varargout instead of the output arguments. In this case, varargout is a cell array of the outputs, where varargout{i} corresponds to dLdXi for i=1,…,NumInputs and varargout{NumInputs+t} corresponds to dLdWt for t=1,…,k, where k is the number of learnable parameters.

Because a weighted addition layer has one output, one learnable parameter, and a variable number of inputs, the syntax for backward for a weighted addition layer is varargout = backward(layer, varargin). In this case, varargin{i} corresponds to Xi for positive integers i less than or equal to NumInputs, varargin{NumInputs+1} corresponds to Z, and varargin{NumInputs+2} corresponds to dLdZ. For the outputs, varargout{i} corresponds to dLdXi for positive integers i less than or equal to NumIntputs and varargout{NumInputs+1} corresponds to dLdW.

The dimensions of X1,...,Xn and Z are the same as in the forward functions. The dimensions of dLdZ are the same as the dimensions of Z. The dimensions and data types of dLdX1,...,dLdXn are the same as the dimensions and data type of X1,...,Xn. The dimension and data type of dLdW is the same as the dimension and data type of the learnable parameter W.

During the backward pass, the layer automatically updates the learnable parameters using the corresponding derivatives.

To include a custom layer in a network, the layer forward functions must accept the outputs of the previous layer and forward propagate arrays with the size expected by the next layer. Similarly, backward must accept inputs with the same size as the corresponding output of the forward function and backward propagate derivatives with the same size.

For a weighted addition layer, the derivatives of the loss with respect to each input are

LXk(i)=jLZjZjXk(i),

where k indexes into each X(i) linearly, Z=f(X(1),,X(n)), L/Z is the gradient propagated from the next layer, j indexes in to Z linearly, and the derivative of the activation is

ZX(i)=Wi.

The derivative of the loss with respect to each learnable parameter Wi is

LWi=jLZjZjWi,

where j indexes the elements of Z linearly and the gradient of the activation is

ZWi=X(i).

Implement the backward function in backward and add a comment to the top of the function that explains the syntaxes of the function.

        function varargout = backward(layer, varargin)
            % [dLdX1,…,dLdXn,dLdW] = backward(layer,X1,…,Xn,Z,dLdZ,~)
            % backward propagates the derivative of the loss function
            % through the layer.
            
            numInputs = layer.NumInputs;
            W = layer.Weights;
            X = varargin(1:numInputs);
            dLdZ = varargin{numInputs+2};
            
            % Calculate derivatives
            dLdX = cell(1,numInputs);
            dLdW = zeros(1,numInputs,'like',W);
            for i = 1:numInputs                
                dLdX{i} = dLdZ * W(i);
                dLdW(i) = sum(dLdZ .* X{i},'all');
            end
            
            % Pack output arguments.
            varargout(1:numInputs) = dLdX;
            varargout{numInputs+1} = dLdW;
        end

Completed Layer

View the completed layer class file.

classdef weightedAdditionLayer < nnet.layer.Layer
    % Example custom weighted addition layer.

    properties (Learnable)
        % Layer learnable parameters
            
        % Scaling coefficients
        Weights
    end
    
    methods
        function layer = weightedAdditionLayer(numInputs,name) 
            % layer = weightedAdditionLayer(numInputs,name) creates a
            % weighted addition layer and specifies the number of inputs
            % and the layer name.

            % Set number of inputs.
            layer.NumInputs = numInputs;

            % Set layer name.
            layer.Name = name;

            % Set layer description.
            layer.Description = "Weighted addition of " + numInputs +  ... 
                " inputs";
        
            % Initialize layer weights.
            layer.Weights = rand(1,numInputs); 
        end
        
        function Z = predict(layer, varargin)
            % Z = predict(layer, X1, ..., Xn) forwards the input data X1,
            % ..., Xn through the layer and outputs the result Z.
            
            X = varargin;
            W = layer.Weights;
            
            % Initialize output
            X1 = X{1};
            sz = size(X1);
            Z = zeros(sz,'like',X1);
            
            % Weighted addition
            for i = 1:layer.NumInputs
                Z = Z + W(i)*X{i};
            end
        end
        
        function varargout = backward(layer, varargin)
            % [dLdX1,…,dLdXn,dLdW] = backward(layer,X1,…,Xn,Z,dLdZ,~)
            % backward propagates the derivative of the loss function
            % through the layer.
            
            numInputs = layer.NumInputs;
            W = layer.Weights;
            X = varargin(1:numInputs);
            dLdZ = varargin{numInputs+2};
            
            % Calculate derivatives
            dLdX = cell(1,numInputs);
            dLdW = zeros(1,numInputs,'like',W);
            for i = 1:numInputs                
                dLdX{i} = dLdZ * W(i);
                dLdW(i) = sum(dLdZ .* X{i},'all');
            end
            
            % Pack output arguments.
            varargout(1:numInputs) = dLdX;
            varargout{numInputs+1} = dLdW;
        end
    end
end

Check Validity of Layer with Multiple Inputs

Check the layer validity of the custom layer weightedAdditionLayer.

Define a custom weighted addition layer. To create this layer, save the file weightedAdditionLayer.m in the current folder.

Create an instance of the layer and check its validity using checkLayer. Specify the valid input sizes to be the typical sizes of a single observation for each input to the layer. The layer expects 4-D array inputs, where the first three dimensions correspond to the height, width, and number of channels of the previous layer output, and the fourth dimension corresponds to the observations.

Specify the typical size of the input of an observation and set 'ObservationDimension' to 4.

layer = weightedAdditionLayer(2,'add');
validInputSize = {[24 24 20],[24 24 20]};
checkLayer(layer,validInputSize,'ObservationDimension',4)
Skipping GPU tests. No compatible GPU device found.
 
Running nnet.checklayer.TestCase
.......... ........
Done nnet.checklayer.TestCase
__________

Test Summary:
	 18 Passed, 0 Failed, 0 Incomplete, 6 Skipped.
	 Time elapsed: 139.3855 seconds.

Here, the function does not detect any issues with the layer.

Use Custom Weighted Addition Layer in Network

You can use a custom layer in the same way as any other layer in Deep Learning Toolbox. This section shows how to create and train a network for digit classification using the weighted addition layer you created earlier.

Load the example training data.

[XTrain,YTrain] = digitTrain4DArrayData;

Define a custom weighted addition layer. To create this layer, save the file weightedAdditionLayer.m in the current folder.

Create a layer graph including the custom layer weightedAdditionLayer.

layers = [
    imageInputLayer([28 28 1],'Name','in')
    convolution2dLayer(5,20,'Name','conv1')
    reluLayer('Name','relu1')
    convolution2dLayer(3,20,'Padding',1,'Name','conv2')
    reluLayer('Name','relu2')
    convolution2dLayer(3,20,'Padding',1,'Name','conv3')
    reluLayer('Name','relu3')
    weightedAdditionLayer(2,'add')
    fullyConnectedLayer(10,'Name','fc')
    softmaxLayer('Name','softmax')
    classificationLayer('Name','classoutput')];

lgraph = layerGraph(layers);
lgraph = connectLayers(lgraph, 'relu1', 'add/in2');

Set the training options and train the network.

options = trainingOptions('adam','MaxEpochs',10);
net = trainNetwork(XTrain,YTrain,lgraph,options);
Training on single CPU.
Initializing input data normalization.
|========================================================================================|
|  Epoch  |  Iteration  |  Time Elapsed  |  Mini-batch  |  Mini-batch  |  Base Learning  |
|         |             |   (hh:mm:ss)   |   Accuracy   |     Loss     |      Rate       |
|========================================================================================|
|       1 |           1 |       00:00:00 |        7.81% |       2.3117 |          0.0010 |
|       2 |          50 |       00:00:20 |       78.91% |       0.6958 |          0.0010 |
|       3 |         100 |       00:00:33 |       91.41% |       0.2488 |          0.0010 |
|       4 |         150 |       00:00:49 |       96.09% |       0.0999 |          0.0010 |
|       6 |         200 |       00:01:11 |       99.22% |       0.0305 |          0.0010 |
|       7 |         250 |       00:01:31 |       98.44% |       0.0585 |          0.0010 |
|       8 |         300 |       00:01:53 |      100.00% |       0.0218 |          0.0010 |
|       9 |         350 |       00:02:09 |       99.22% |       0.0161 |          0.0010 |
|      10 |         390 |       00:02:24 |      100.00% |       0.0090 |          0.0010 |
|========================================================================================|

View the weights learned by the weighted addition layer.

net.Layers(8).Weights
ans = 1x2 single row vector

    1.0095    0.9917

Evaluate the network performance by predicting on new data and calculating the accuracy.

[XTest,YTest] = digitTest4DArrayData;
YPred = classify(net,XTest);
accuracy = sum(YTest==YPred)/numel(YTest)
accuracy = 0.9882

See Also

| |

Related Topics