Create and Train Custom Neural Network Architectures

Neural Network Toolbox™ software provides a flexible network object type that allows many kinds of networks to be created and then used with functions such as init, sim, and train.

Type the following to see all the network creation functions in the toolbox.

help nnnetwork

This flexibility is possible because networks have an object-oriented representation. The representation allows you to define various architectures and assign various algorithms to those architectures.

To create custom networks, start with an empty network (obtained with the network function) and set its properties as desired.

net = network

The network object consists of many properties that you can set to specify the structure and behavior of your network.

The following sections show how to create a custom network by using these properties.

Custom Network

Before you can build a network you need to know what it looks like. For dramatic purposes (and to give the toolbox a workout) this section leads you through the creation of the wild and complicated network shown below.

Each of the two elements of the first network input is to accept values ranging between 0 and 10. Each of the five elements of the second network input ranges from −2 to 2.

Before you can complete your design of this network, the algorithms it employs for initialization and training must be specified.

Each layer's weights and biases are initialized with the Nguyen-Widrow layer initialization method (initnw). The network is trained with Levenberg-Marquardt backpropagation (trainlm), so that, given example input vectors, the outputs of the third layer learn to match the associated target vectors with minimal mean squared error (mse).

Network Definition

The first step is to create a new network. Type the following code to create a network and view its many properties:

net = network

Architecture Properties

The first group of properties displayed is labeled architecture properties. These properties allow you to select the number of inputs and layers and their connections.

Number of Inputs and Layers.  The first two properties displayed in the dimensions group are numInputs and numLayers. These properties allow you to select how many inputs and layers you want the network to have.

net =

         numInputs: 0
         numLayers: 0

Note that the network has no inputs or layers at this time.

Change that by setting these properties to the number of inputs and number of layers in the custom network diagram.

net.numInputs = 2;
net.numLayers = 3;

net.numInputs is the number of input sources, not the number of elements in an input vector (net.inputs{i}.size).

Bias Connections.  Type net and press Enter to view its properties again. The network now has two inputs and three layers.

net =
    Neural Network:
         numInputs: 2
         numLayers: 3

Examine the next four properties in the connections group:

       biasConnect: [0; 0; 0]
      inputConnect: [0 0; 0 0; 0 0]
      layerConnect: [0 0 0; 0 0 0; 0 0 0]
     outputConnect: [0 0 0]

These matrices of 1s and 0s represent the presence and absence of bias, input weight, layer weight, and output connections. They are currently all zeros, indicating that the network does not have any such connections.

The bias connection matrix is a 3-by-1 vector. To create a bias connection to the ith layer you can set net.biasConnect(i) to 1. Specify that the first and third layers are to have bias connections, as the diagram indicates, by typing the following code:

net.biasConnect(1) = 1;
net.biasConnect(3) = 1;

You could also define those connections with a single line of code.

net.biasConnect = [1; 0; 1];

Input and Layer Weight Connections.  The input connection matrix is 3-by-2, representing the presence of connections from two sources (the two inputs) to three destinations (the three layers). Thus, net.inputConnect(i,j) represents the presence of an input weight connection going to the ith layer from the jth input.

To connect the first input to the first and second layers, and the second input to the second layer (as indicated by the custom network diagram), type

net.inputConnect(1,1) = 1;
net.inputConnect(2,1) = 1;
net.inputConnect(2,2) = 1;

or this single line of code:

net.inputConnect = [1 0; 1 1; 0 0];

Similarly, net.layerConnect(i.j) represents the presence of a layer-weight connection going to the ith layer from the jth layer. Connect layers 1, 2, and 3 to layer 3 as follows:

net.layerConnect = [0 0 0; 0 0 0; 1 1 1];

Output Connections.  The output connections are a 1-by-3 matrix, indicating that they connect to one destination (the external world) from three sources (the three layers).

To connect layers 2 and 3 to the network output, type

net.outputConnect = [0 1 1];

Number of Outputs

Type net and press Enter to view the updated properties. The final three architecture properties are read-only values, which means their values are determined by the choices made for other properties. The first read-only property in the dimension group is the number of outputs:

numOutputs: 2

By defining output connection from layers 2 and 3, you specified that the network has two outputs.

Subobject Properties

The next group of properties in the output display is subobjects:

            inputs: {2x1 cell array of 2 inputs}
            layers: {3x1 cell array of 3 layers}
           outputs: {1x3 cell array of 2 outputs}
            biases: {3x1 cell array of 2 biases}
      inputWeights: {3x2 cell array of 3 weights}
      layerWeights: {3x3 cell array of 3 weights}


When you set the number of inputs (net.numInputs) to 2, the inputs property becomes a cell array of two input structures. Each ith input structure (net.inputs{i}) contains additional properties associated with the ith input.

To see how the input structures are arranged, type

ans = 
    [1x1 nnetInput]
    [1x1 nnetInput]

To see the properties associated with the first input, type


The properties appear as follows:

ans = 
              name: 'Input'
    feedbackOutput: []
       processFcns: {}
     processParams: {1x0 cell array of 0 params}
   processSettings: {0x0 cell array of 0 settings}
    processedRange: []
     processedSize: 0
             range: []
              size: 0
          userdata: (your custom info)

If you set the exampleInput property, the range, size, processedSize, and processedRange properties will automatically be updated to match the properties of the value of exampleInput.

Set the exampleInput property as follows:

net.inputs{1}.exampleInput = [0 10 5; 0 3 10];

If you examine the structure of the first input again, you see that it now has new values.

The property processFcns can be set to one or more processing functions. Type help nnprocess to see a list of these functions.

Set the second input vector ranges to be from −2 to 2 for five elements as follows:

net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};

View the new input properties. You will see that processParams, processSettings, processedRange and processedSize have all been updated to reflect that inputs will be processed using removeconstantrows and mapminmax before being given to the network when the network is simulated or trained. The property processParams contains the default parameters for each processing function. You can alter these values, if you like. See the reference page for each processing function to learn more about their parameters.

You can set the size of an input directly when no processing functions are used:

net.inputs{2}.size = 5;

Layers.  When you set the number of layers (net.numLayers) to 3, the layers property becomes a cell array of three-layer structures. Type the following line of code to see the properties associated with the first layer.

ans = 
    Neural Network Layer
              name: 'Layer'
        dimensions: 0
       distanceFcn: (none)
     distanceParam: (none)
         distances: []
           initFcn: 'initwb'
       netInputFcn: 'netsum'
     netInputParam: (none)
         positions: []
             range: []
              size: 0
       topologyFcn: (none)
       transferFcn: 'purelin'
     transferParam: (none)
          userdata: (your custom info)

Type the following three lines of code to change the first layer's size to 4 neurons, its transfer function to tansig, and its initialization function to the Nguyen-Widrow function, as required for the custom network diagram.

net.layers{1}.size = 4;
net.layers{1}.transferFcn = 'tansig';
net.layers{1}.initFcn = 'initnw';

The second layer is to have three neurons, the logsig transfer function, and be initialized with initnw. Set the second layer's properties to the desired values as follows:

net.layers{2}.size = 3;
net.layers{2}.transferFcn = 'logsig';
net.layers{2}.initFcn = 'initnw';

The third layer's size and transfer function properties don't need to be changed, because the defaults match those shown in the network diagram. You need to set only its initialization function, as follows:

net.layers{3}.initFcn = 'initnw';

Outputs.  Use this line of code to see how the outputs property is arranged:

ans = 
    []    [1x1 nnetOutput]    [1x1 nnetOutput]

Note that outputs contains two output structures, one for layer 2 and one for layer 3. This arrangement occurs automatically when net.outputConnect is set to [0 1 1].

View the second layer's output structure with the following expression:

ans = 
    Neural Network Output

              name: 'Output'
     feedbackInput: []
     feedbackDelay: 0
      feedbackMode: 'none'
       processFcns: {}
     processParams: {1x0 cell array of 0 params}
   processSettings: {0x0 cell array of 0 settings}
    processedRange: [3x2 double]
     processedSize: 3
             range: [3x2 double]
              size: 3
          userdata: (your custom info)

The size is automatically set to 3 when the second layer's size (net.layers{2}.size) is set to that value. Look at the third layer's output structure if you want to verify that it also has the correct size.

Outputs have processing properties that are automatically applied to target values before they are used by the network during training. The same processing settings are applied in reverse on layer output values before they are returned as network output values during network simulation or training.

Similar to input-processing properties, setting the exampleOutput property automatically causes size, range, processedSize, and processedRange to be updated. Setting processFcns to a cell array list of processing function names causes processParams, processSettings, processedRange to be updated. You can then alter the processParam values, if you want to.

Biases, Input Weights, and Layer Weights.  Enter the following commands to see how bias and weight structures are arranged:


Here are the results of typing net.biases:

ans = 
    [1x1 nnetBias]
    [1x1 nnetBias]

Each contains a structure where the corresponding connections (net.biasConnect, net.inputConnect, and net.layerConnect) contain a 1.

Look at their structures with these lines of code:


For example, typing net.biases{1} results in the following output:

    initFcn: (none)
      learn: true
   learnFcn: (none)
 learnParam: (none)
       size: 4
   userdata: (your custom info)

Specify the weights' tap delay lines in accordance with the network diagram by setting each weight's delays property:

net.inputWeights{2,1}.delays = [0 1];
net.inputWeights{2,2}.delays = 1;
net.layerWeights{3,3}.delays = 1;

Network Functions

Type net and press Return again to see the next set of properties.

      adaptFcn: (none)
    adaptParam: (none)
      derivFcn: 'defaultderiv'
     divideFcn: (none)
   divideParam: (none)
    divideMode: 'sample'
       initFcn: 'initlay'
    performFcn: 'mse'
  performParam: .regularization, .normalization
      plotFcns: {}
    plotParams: {1x0 cell array of 0 params}
      trainFcn: (none)
    trainParam: (none)

Each of these properties defines a function for a basic network operation.

Set the initialization function to initlay so the network initializes itself according to the layer initialization functions already set to initnw, the Nguyen-Widrow initialization function.

net.initFcn = 'initlay';

This meets the initialization requirement of the network.

Set the performance function to mse (mean squared error) and the training function to trainlm (Levenberg-Marquardt backpropagation) to meet the final requirement of the custom network.

net.performFcn = 'mse';
net.trainFcn = 'trainlm';

Set the divide function to dividerand (divide training data randomly).

net.divideFcn = 'dividerand';

During supervised training, the input and target data are randomly divided into training, test, and validation data sets. The network is trained on the training data until its performance begins to decrease on the validation data, which signals that generalization has peaked. The test data provides a completely independent test of network generalization.

Set the plot functions to plotperform (plot training, validation and test performance) and plottrainstate (plot the state of the training algorithm with respect to epochs).

net.plotFcns = {'plotperform','plottrainstate'};

Weight and Bias Values

Before initializing and training the network, type net and press Return, then look at the weight and bias group of network properties.

weight and bias values:
           IW: {3x2 cell} containing 3 input weight matrices
           LW: {3x3 cell} containing 3 layer weight matrices
            b: {3x1 cell} containing 2 bias vectors

These cell arrays contain weight matrices and bias vectors in the same positions that the connection properties (net.inputConnect, net.layerConnect, net.biasConnect) contain 1s and the subobject properties (net.inputWeights, net.layerWeights, net.biases) contain structures.

Evaluating each of the following lines of code reveals that all the bias vectors and weight matrices are set to zeros.

net.IW{1,1}, net.IW{2,1}, net.IW{2,2}
net.LW{3,1}, net.LW{3,2}, net.LW{3,3}
net.b{1}, net.b{3}

Each input weight net.IW{i,j}, layer weight net.LW{i,j}, and bias vector net.b{i} has as many rows as the size of the ith layer (net.layers{i}.size).

Each input weight net.IW{i,j} has as many columns as the size of the jth input (net.inputs{j}.size) multiplied by the number of its delay values (length(net.inputWeights{i,j}.delays)).

Likewise, each layer weight has as many columns as the size of the jth layer (net.layers{j}.size) multiplied by the number of its delay values (length(net.layerWeights{i,j}.delays)).

Network Behavior


Initialize your network with the following line of code:

net = init(net);

Check the network's biases and weights again to see how they have changed:

net.IW{1,1}, net.IW{2,1}, net.IW{2,2}
net.LW{3,1}, net.LW{3,2}, net.LW{3,3}
net.b{1}, net.b{3}

For example,

ans =
   -0.3040    0.4703
   -0.5423   -0.1395
    0.5567    0.0604
    0.2667    0.4924


Define the following cell array of two input vectors (one with two elements, one with five) for two time steps (i.e., two columns).

X = {[0; 0] [2; 0.5]; [2; -2; 1; 0; 1] [-1; -1; 1; 0; 1]};

You want the network to respond with the following target sequences for the second layer, which has three neurons, and the third layer with one neuron:

T = {[1; 1; 1] [0; 0; 0]; 1 -1};

Before training, you can simulate the network to see whether the initial network's response Y is close to the target T.

Y = sim(net,X)
Y = 
     [3x1 double]    [3x1 double]
     [      1.7148]    [      2.2726]

The cell array Y is the output sequence of the network, which is also the output sequence of the second and third layers. The values you got for the second row can differ from those shown because of different initial weights and biases. However, they will almost certainly not be equal to targets T, which is also true of the values shown.

The next task is optional. On some occasions you may wish to alter the training parameters before training. The following line of code displays the default Levenberg-Marquardt training parameters (defined when you set net.trainFcn to trainlm).


The following properties should be displayed.

ans = 
    Show Training Window Feedback   showWindow: true
    Show Command Line Feedback showCommandLine: false
    Command Line Frequency                show: 25
    Maximum Epochs                      epochs: 1000
    Maximum Training Time                 time: Inf
    Performance Goal                      goal: 0
    Minimum Gradient                  min_grad: 1e-07
    Maximum Validation Checks         max_fail: 6
    Mu                                      mu: 0.001
    Mu Decrease Ratio                   mu_dec: 0.1
    Mu Increase Ratio                   mu_inc: 10
    Maximum mu                          mu_max: 10000000000

You will not often need to modify these values. See the documentation for the training function for information about what each of these means. They have been initialized with default values that work well for a large range of problems, so there is no need to change them here.

Next, train the network with the following call:

net = train(net,X,T);

Training launches the neural network training window. To open the performance and training state plots, click the plot buttons.

After training, you can simulate the network to see if it has learned to respond correctly:

Y = sim(net,X)

     [3x1 double]    [3x1 double]
     [      1.0000]    [     -1.0000]

The second network output (i.e., the second row of the cell array Y), which is also the third layer's output, matches the target sequence T.

Was this topic helpful?