Neural Network NAR-based time-series prediction starts failing after several timesteps

I am starting to experiment with NAR-based time-series prediction. I've followed several tutorials to write a small simple script to predict a simple sin(t) signal. The resulting prediction is quite good (as expected) at the begining, but as time progresses, the network starts failing catastrophically. Is there anything I am doing wrong?
Here is the code I am using:
DELAY=1:100;
HIDDEN=[10];
t=linspace(1,100,1000);
prueba=cos(t);
datos=prueba;
net = narnet(DELAY,HIDDEN);
[Xs,Xi,Ai,Ts] = preparets(net,{},{},num2cell(datos));
net = train(net,Xs,Ts,Xi,Ai);
net = closeloop(net);
[Xs,Xi,Ai,Ts] = preparets(net,{},{},num2cell(prueba));
y = net(Xs,Xi,Ai);
plot(prueba(DELAY(end)+1:end),'k')
hold on
plot(cell2mat(y),'r')
And the results I am getting are illustrated in the next figure (target-black; prediction-red)

 Accepted Answer

After a not so quick look I have the following comments:
1. cos(t) has a period of 2*pi and satisfies a 2nd order homogeneous difference equation. Therefore
a. Only two delays per period are necessary. No more than eight to sixteen delays per period should be sufficient.
b. No hidden layer is necessary. One hidden layer with one hidden node is rarely better. One hidden layer with H = 2 hidden nodes should be more than sufficient.
2. The autocorrelation function of cos(t) with N = 1000 and dt = 0.1 has 859 positive lag points with significant correlations that have absolute values greater than 0.046.
3. The default divideFcn of narnet is 'dividerand'. However, random sampling of a uniformly sampled time series destroys the beneficial effect of correlations. 'divideblock' and 'divideind' or even 'divideint' with dt = 0.3, should work much better.
4. With I = O =1, N=1000, Ntrn = 700, Nval = 150, Ntst = 150, NFD = 100 and H = 10, there are
Nw = (NFD+1)*H+(H+1)*O = 1021 unknown weights
Ntrneq = Ntrn*O = 700 training equations.
Therefore the net is severely overfit and overtraining mitigation via a large validation set and/or mse regularization should be instituted.
5. Run the original data through the closed loop configuration and separately tabulate trn, val, and tst performance before before testing on new data.
Hope this helps.
Thank you for formally accepting my answer
Greg

2 Comments

Thank you very much for your detailed answer and the time you've put on it. I have modified the code according to your suggestions, as follows:
DELAY=1:12; % <---
HIDDEN=[2]; % <---
t=linspace(1,100,3000); % <---
prueba=cos(t);
datos=prueba;
net = narnet(DELAY,HIDDEN);
net.divideFcn='divideblock'; % <---
net.divideParam.trainRatio=0.40; % <---
net.divideParam.valRatio=0.30; % <---
net.divideParam.testRatio=0.30; % <---
[Xs,Xi,Ai,Ts] = preparets(net,{},{},num2cell(datos));
net = train(net,Xs,Ts,Xi,Ai);
net = closeloop(net);
[Xs,Xi,Ai,Ts] = preparets(net,{},{},num2cell(prueba));
y = net(Xs,Xi,Ai);
plot(prueba(DELAY(end)+1:end),'k')
hold on
plot(cell2mat(y),'r')
But now I'm getting even worse results.
With 2-neuron hidden layer:
With no hidden layer:
Actually, I started experimenting with a simpler network, like the one you suggest, but increased a lot the size of the hidden layer and the number of delays because I found I was getting much better results.
I should note that network training is always halted due to gradient, and not even a single validaton check has been performed in any of the execution runs I've performed.
DELAY=1:12; % <---
HIDDEN=[2]; % <---
x =linspace(1,100,3000); % <---
% MATLAB convention: use x for input, t for target, y for output
% Now N = 3000. Why start at x=1 ? %prueba=cos(t); %datos=prueba; % test and data in English?
t = cos(x);
T = num2cell(t);
net = narnet(DELAY,HIDDEN);
net.divideFcn='divideblock'; % <---
net.divideParam.trainRatio=0.40; % <---
net.divideParam.valRatio=0.30; % <---
net.divideParam.testRatio=0.30; % <---
% Alternatively, can get the trn/val/tst indices directly. Then assign them to the net, directly create the trn/val/tst sets and calc the MSE references for the coefficients of determination.
[ trnind, valind, tstind] = divideblock(N,0.4,0.3,0.3);
Ntrn = length(trnind)
net.trainParam.trainInd = trnind;
ttrn = t(trnind);
MSEtrn00 = var(ttrn,1) % 1-dim version
MSEtrn00a = Ntrn*MSEtrn00/(Ntrn-1)
etc for val & tst
% Lazy: Didn't take delays into account
[Xs,Xi,Ai,Ts] = preparets(net,{},{},num2cell(datos));
net = train(net,Xs,Ts,Xi,Ai);
% Obtain ys and overlay on plot of ts
[ net tr Ys ] = train(net, Xs, Ts, Xi, Ai );
ts = cell2mat(Ts);
ys = cell2mat(Ys);
% Obtain other pertinent info from tr.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Use subscipt c for closed loop quantities
netc = closeloop(netc);
tc = t;
Tc = T;
[Xcs,Xci,Aci,Tcs] = preparets(netc,{},{}, Tc));
Ycs = netc(Xcs,Xci,Aci);
ycs = cell2mat(Ycs)
% Plot trn/val/tst in different colors
plot(tc(DELAY(end)+1:end),'k')
hold on
plot(ycs),'r--')
% May need 2 plots on different scales: 1 before and 1 after the predictions blow up
% I don't know how to postpone the prediction blowup without just using trial and error for FD, H and trn/val/tst data ratios.

Sign in to comment.

More Answers (1)

I took a look at the openloop problem
N = 1000
dx = 0.1
x = dx*(0:N-1);
t = cos(x);
[ I N ] = size(x)
[O N ] = size(t)
X = con2seq(x);
T = con2seq(t);
ID = [];
FD = 1:63; % One Period
[trnind,valind,tstind] = divideblock( N, 0.7, 0.15, 0.15);
ttrn = t(trnind); tval = t(valind); ttst = t(tstind);
Ntrn = length(ttrn), Nval = length(tval), Ntst = length(ttst)
MSEtrn00 = mean(var(ttrn',1))
MSEtrn00a = mean(var(ttrn',0))
MSEval00 = mean(var(tval',1))
MSEtst00 = mean(var(ttst',1))
Ntrneq = Ntrn*O
NID = length(ID)*I
NFD = length(FD)*O
ND = NID+NFD
LDB = max([ ID, FD ])
if min(ID)==0 & max(ID)>=max(FD)
LDB = LDB+1
end
%Nw = (ND+1)*H+(H+1)*O
Hub = -1 + ceil((Ntrneq-O)/(ND+O+1)) % 10
Hmax = Hub, dH = 1, Hmin = 0
Ntrials = 10
for H = Hmin:dH:Hmax
for i = 1:Ntrials
-----SNIP
end
end
% maxBestepoch =
% 1 15 17 9 10 7 6 6 7 6 6
% maxR2trn =
% -0.597 0.990 0.992 0.992 0.994 0.993 0.995 0.996 0.997 0.998 0.999
% maxR2trna =
% -0.755 0.989 0.990 0.989 0.990 0.987 0.989 0.987 0.9898 0.988 0.989
% maxR2val =
% -0.567 0.989 0.991 0.992 0.993 0.993 0.995 0.995 0.997 0.998 0.999
% maxR2tst =
% -0.951 0.990 0.992 0.993 0.994 0.992 0.995 0.996 0.997 0.998 0.999
Well, the H=0 results are lousy but the H = 1 results are very good.
I am surprised at both results.
Greg

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!