This topic presents part of a typical multilayer network workflow. For more information and other steps, see Multilayer Neural Networks and Backpropagation Training.
Neural network training can be more efficient if you perform certain preprocessing steps on the network inputs and targets. This section describes several preprocessing routines that you can use. (The most common of these are provided automatically when you create a network, and they become part of the network object, so that whenever the network is used, the data coming into the network is preprocessed in the same way.)
For example, in multilayer networks, sigmoid transfer functions are generally used in the hidden layers. These functions become essentially saturated when the net input is greater than three (exp (−3) ≅ 0.05). If this happens at the beginning of the training process, the gradients will be very small, and the network training will be very slow. In the first layer of the network, the net input is a product of the input times the weight plus the bias. If the input is very large, then the weight must be very small in order to prevent the transfer function from becoming saturated. It is standard practice to normalize the inputs before applying them to the network.
Generally, the normalization step is applied to both the input vectors and the target vectors in the data set. In this way, the network output always falls into a normalized range. The network output can then be reverse transformed back into the units of the original target data when the network is put to use in the field.
It is easiest to think of the neural network as having a preprocessing block that appears between the input and the first layer of the network and a postprocessing block that appears between the last layer of the network and the output, as shown in the following figure.
Most of the network creation functions in the toolbox, including the multilayer network creation functions, such as feedforwardnet, automatically assign processing functions to your network inputs and outputs. These functions transform the input and target values you provide into values that are better suited for network training.
You can override the default input and output processing functions by adjusting network properties after you create the network.
To see a cell array list of processing functions assigned to the input of a network, access this property:
where the index 1 refers to the first input vector. (There is only one input vector for the feedforward network.) To view the processing functions returned by the output of a two-layer network, access this network property:
where the index 2 refers to the output vector coming from the second layer. (For the feedforward network, there is only one output vector, and it comes from the final layer.) You can use these properties to change the processing functions that you want your network to apply to the inputs and outputs. However, the defaults usually provide excellent performance.
Several processing functions have parameters that customize their operation. You can access or change the parameters of the ith input processing function for the network input as follows:
You can access or change the parameters of the ith output processing function for the network output associated with the second layer, as follows:
For multilayer network creation functions, such as feedforwardnet, the default input processing functions are removeconstantrows and mapminmax. For outputs, the default processing functions are also removeconstantrows and mapminmax.
The following table lists the most common preprocessing and postprocessing functions. In most cases, you will not need to use them directly, since the preprocessing steps become part of the network object. When you simulate or train the network, the preprocessing and postprocessing will be done automatically.
Normalize inputs/targets to fall in the range [−1, 1]
Normalize inputs/targets to have zero mean and unity variance
Extract principal components from the input vector
Process unknown inputs
Remove inputs/targets that are constant
Unknown or "don't care" targets can be represented with NaN values. We do not want unknown target values to have an impact on training, but if a network has several outputs, some elements of any target vector may be known while others are unknown. One solution would be to remove the partially unknown target vector and its associated input vector from the training set, but that involves the loss of the good target values. A better solution is to represent those unknown targets with NaN values. All the performance functions of the toolbox will ignore those targets for purposes of calculating performance and derivatives of performance.