% This is a demo of the FITNET function which should be
% used for nonlinear regression and curvefitting. It calls
% the generic function FEEDFORWARDNET which never
% has to be used explicitly. FITNET replaces the obsolete
% NEWFIT which calls the obsolete NEWFF.
%
% The demo illustrates a simplistic, but useful, approach
% to dealing with the ageold questions
%
% 1. How many hidden layers?
% 2. How many hidden nodes per layer?
% 3. How much training data?
%
% A recommended approach:
%
% 1. Always begin with 1 hidden layer. The Multilayer
% Perceptron (MLP) with a single hidden layer of H,
% sufficiently many, sigmoidal transfer functions is a
% universal approximator. On rare occasions it is
% useful to add a second hidden layer to reduce the
% necessary number of hidden nodes (H1+H2 < H).
% 2. Estimate, by trial and error, the minimum number
% of hidden nodes necessary for successfully approxi
% mating the underlying inputoutput transformation.
% For a smooth function with Nlocmax local maxima
% ( endpoint maxima only count 1/2) a reasonable
% lower bound is H >= 2*Nlocmax. The addition of
% realworld noise and measurement error will not
% change that minimum number. However, the
% contamination may make it difficult to identify the
% significant errorfree maxima.
% 3. The minimum number of training input/target pairs
% needed to adequately estimate the resulting number
% of weights, Nw, tends to vary linearly with H. If the
% output target vectors are Odimensional, the Ntrn
% training pairs yield Ntrneq = Ntrn*O training
% equations for estimating Nw unknown weights. If the
% input vectors are Idimensional, the number of
% weights for a static MLP is given by
%
% Nw = (I+1)*H+(H+1)*O = O+(I+O+1)*H
%
% 4. The number of estimation degrees of freedom ( See
% Wikipedia ) is Ndof = Ntrneq  Nw. When there are
% more unknown weights than training equations (i.e.,
% Nw > Ntrneq and Ndof < 0) the net is said to be
% OVERFIT with too many weights because an exact
% training data solution can be obtained with
% ~ abs(Ndof) weights fixed to any arbitrary finite value.
% This tends to prevent the net from performing
% adequately on nontraining data
%
%5. There are several methods used to train overfit nets
% ( See the comp.ai.neuralnets FAQ). VALIDATION
% SET STOPPING and REGULARIZATION are two
% methods that are readily available with the MATLAB
% NNTBX. However, they will not be addressed here.
%
% 6. The training technique used below is to merely avoid
% overfitting by limiting the number of hidden nodes
% so that the number of unknown weights is smaller
% than the number of training equations and the
% resulting number of estimation degrees of freedom
% is positive.
%
%7. The success of the error minimization algorithm
% depends on a forfituous choice of initial weight values.
% Therefore, if the specified training goal is not achieved
% initially, multiple random weight intialization trials
% should be implemented. Given H, Ntrials = 10 is
% usually sufficient.
%
%8. If the training data is resubstituted into the net to get
% an estimate of the generalization performance (i.e.,
% the peformance on nondesign data ) the estimate
% will obviously be biased. However, the bias can be
% somewhat mitigated by dividing the sum of absolute
% or squared errors by the estimation degrees of
% freedom, Ndof, instead of the number of training
% equations, Ntrneq. If there is a significant difference
% between the biased (e.g., MSE, NMSE or R^2) and
% adjusted (MSEa, NMSEa and Ra^2) performance
% estimates, another method of estimation should be
% used. The obvious choice is to use a sufficiently
% large holdout set of nondesign test data. If that is
% not possible, averaging over multiple random
% design/test data division and random weight
% initialization trials are two of many alternatives.
% Although the better known stratified crossvalidation
% option is available via the CROSSVAL function in
% the STATS TBX, it is more difficult to implement.
close all, clear all, clc, plt = 0;
tic
[ x, t ] = simplefit_dataset;
[ I N ] = size(x) % [ 1 94 ]
[ O N ] = size(t) % [ 1 94 ]
Neq = prod(size(t)) % 94
% MSE normalization references
MSE00 = mean(var(t',1)) % 8.3378
MSE00a = mean(var(t')) % 8.4274
plt=plt+1, figure(plt) % figure 1
plot( x, t, 'LineWidth', 2)
title( ' SIMPLEFIT DATASET ')
Nlocmax = 2.5 % 2.5 local maxima
xt = [ x; t ];
rangext = minmax(xt)
% rangext = 0 9.9763
% 0 10
% No need to standardize or normalize
% H >= 2*Nlocmax = 5
% Nw = (I+1)*H+(H+1)*O;
% Neq > Nw ==> H <= Hub
Hub = 1+ceil( (NeqO) / (I+O+1)) % 30
Hmax = 2*Nlocmax+1 % 6
dH = 1
Hmin =0
j=0
rng(0)
for h = Hmin:dH:Hmax
j=j+1;
h=h
if h==0
net = fitnet([]);
Nw = (I+1)*O
else
net = fitnet(h);
Nw = (I+1)*h+(h+1)*O
end
Ndof = NeqNw
net.divideFcn = ''; % No nontraining data
[ net tr y ] = train(net,x,t);
plt = plt+1,figure(plt)
hold on
plot( x, t, '.', 'LineWidth', 2 )
plot( x, y, 'ro', 'LineWidth', 2 )
legend( 'TARGET', 'OUTPUT' )
title( [' No. HIDDEN NODES = ', ...
num2str(h)], 'LineWidth', 2 )
stopcrit{j,1} = tr.stop;
numepochs(j,1) = tr.num_epochs;
bestepoch(j,1) = tr.best_epoch;
MSE(j,1) = tr.perf(tr.best_epoch+1);
MSEa(j,1) = Neq*MSE(j)/Ndof;
end
stopcrit = stopcrit
% stopcrit = 'Minimum gradient reached.'
% 'Minimum gradient reached.'
% 'Maximum epoch reached.'
% 'Minimum gradient reached.'
% 'Minimum gradient reached.'
% 'Minimum gradient reached.'
% 'Minimum gradient reached.'
H= (Hmin:dH:Hmax)';
R2 = 1  MSE/MSE00;
R2a = 1  MSEa/MSE00a;
format short g
summary = [ H bestepoch R2 R2a ]
toc % Elapsed time ~20 sec
% summary =
% H bestepoch R2 R2a
% 0 2 0.54902 0.54412
% 1 25 0.83429 0.82876
% 2 1000 0.87641 0.86789
% 3 83 0.87641 0.86317
% 4 27 0.99430 0.99345
% 5 81 0.99999 0.99998
% 6 425 0.99999 0.99998
Hope this helps.
Greg
