Neural Network for predictions. How to improve it

13 views (last 30 days)
Hi everyone,
I am trying to build a code with the NN toolbox, able to predict at time t+2 the Manganese content in a reservoir, given as inputs other 7 variables (among these, also the Mn at time t=0).
While I am waiting for far more data, which definitely will improve the performance, I am trying to play with the several options through the network to improve it. You find the code below:
_ inputs = Inputs7old';
targets = Mnt2old';
% Create a Fitting Network
hiddenLayerSize = 10;
FIT1 = fitnet(hiddenLayerSize);
FIT1.numLayers = 3;
FIT1.layers{2}.name='Hidden2';
FIT1.layers{3}.name='Output';
FIT1.layers{2}.dimensions=10;
FIT1.layers{3}.dimensions=1;
FIT1.inputConnect = [1; 0; 0];
FIT1.layerConnect = [0 0 0;1 0 0;1 1 0];
FIT1.outputConnect = [0 0 1];
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
FIT1.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
FIT1.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};
%Initilization Functions
FIT1.layers{1}.initFcn = 'initnw';
FIT1.layers{2}.initFcn = 'initnw';
FIT1.layers{3}.initFcn = 'initnw';
%Net Input Functions
FIT1.layers{1}.netInputFcn = 'netsum';
FIT1.layers{2}.netInputFcn = 'netsum';
FIT1.layers{3}.netInputFcn = 'netsum';
%Transfer functions
FIT1.layers{1}.transferFcn = 'satlin';
FIT1.layers{2}.transferFcn = 'purelin';
FIT1.layers{3}.transferFcn = 'purelin';
%Input Weights delays
FIT1.inputWeights{2,1}.delays=0;
FIT1.inputWeights{3,1}.delays=0;
%Input Weights learning
FIT1.inputWeights{1,1}.learn=1;
FIT1.inputWeights{2,1}.learn=1;
FIT1.inputWeights{3,1}.learn=1;
FIT1.inputWeights{1,1}.learnFcn='learncon';
FIT1.inputWeights{2,1}.learnFcn='learncon';
FIT1.inputWeights{3,1}.learnFcn='learncon';
%Layer Weight Functions
FIT1.layerWeights{1,2}.weightFcn='normprod';
FIT1.layerWeights{1,3}.weightFcn='normprod';
FIT1.layerWeights{2,3}.weightFcn='normprod';
;
%Layer Initialization Functions
%FIT1.layerWeights{1,2}.initFcn='initcon';
%FIT1.layerWeights{1,3}.weightFcn='normprod';
%FIT1.layerWeights{2,3}.weightFcn='normprod';
%view(FIT1)
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
FIT1.divideFcn = 'dividerand'; % Divide data randomly
FIT1.divideMode = 'sample'; % Divide up every sample
FIT1.divideParam.trainRatio = 64/100;
FIT1.divideParam.testRatio = 16/100;
FIT1.divideParam.valRatio = 20/100;
% For help on training function 'trainlm' type: help trainlm
% For a list of all training functions type: help nntrain
FIT1.trainFcn = 'trainrp'; % Resilient Backpropagation
%FIT1.trainFcn = 'trainlm'; % Levenberg-Marquardt
%FIT1.trainFcn = 'trainscg'; % Scaled Conjugate Gradient
%FIT1.trainFcn = 'trainbr'; % Bayesian Regularization
%FIT1.trainFcn = 'traingdm'; % Gradient descent with momentum backpropagation
%FIT1.trainFcn = 'trainb'; % Batch training
%FIT1.trainFcn = 'trainbfg'; % BFGS quasi-Newton backpropagation
%FIT1.trainFcn = 'traincgb'; % Conjugate gradient backpropagation with Powell-Beale restarts
%FIT1.trainFcn = 'trainoss'; % One-step secant backpropagation
%FIT1.trainFcn = 'trainr'; % Random order incremental training with learning functions
%FIT1.trainFcn = 'trains'; % Sequential order incremental training with learning functions
%Training Parameters
FIT1.trainParam.epochs=1000;
FIT1.trainParam.time=Inf;
FIT1.trainParam.min_grad=0.00001;
FIT1.trainParam.max_fail=6;
FIT1.trainParam.delta0=0.07;
FIT1.trainParam.delta_inc=1.25;
FIT1.trainParam.delta_dec=0.5;
FIT1.trainParam.deltamax=50;
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
FIT1.performFcn = 'mse'; % Mean squared error
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
FIT1.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotfit'};
% Train the Network
[FIT1,tr] = train(FIT1,inputs,targets);
% Test the Network
outputs = FIT1(inputs);
%Outputs >0
for i=1:size(Mnt2old)
if outputs(1,i) <0
outputs(1,i)=0.001;
end
end
errors = gsubtract(targets,outputs);
performance = perform(FIT1,targets,outputs)
Rtr=corrcoef(outputs,Mnt2old);
R2tr=Rtr(1,2)^2
for i=1:max(size(errors))
errors2(1,i)=errors(1,i)^2;
end
RMSEtr= (mse(errors,outputs))^0.5;
MSEtr=(sum(errors2))/max(size(Mnt2old));
RMSEtr2=MSEtr^0.5;
old=max(size(Mnt2old));
Mtrain=mean(Mnt2old);
Mquadrerrtrain=RMSEtr/(max(size(Mnt2old))^(0.5));
EpctMquad=100*Mquadrerrtrain/Mtrain
Maritmerrtrain=mean(abs(errors));
EpctMaritm=100*Maritmerrtrain/Mtrain
EpctRMSE=100+RMSEtr/Mtrain
PeaksErrratioT=EpctRMSE/EpctMaritm
% Recalculate Training, Validation and Test Performance
trainTargets = targets .* tr.trainMask{1};
%valTargets = targets .* tr.valMask{1};
%testTargets = targets .* tr.testMask{1};
trainPerformance = perform(FIT1,trainTargets,outputs);
%valPerformance = perform(FF1,valTargets,outputs)
%testPerformance = perform(FF1,testTargets,outputs)
% View the Network
%view(FF1)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, plotfit(net,inputs,targets)
%figure, plotregression(targets,outputs)
%figure, ploterrhist(errors)
%Validation
Outputval=FIT1(Inputs72011');
for i=1:size(Mnt22011)
if Outputval(1,i) <0
Outputval(1,i)=0.001;
end
end
plot (Outputval,'r')
hold on
plot (Mnt22011,'g')
hold off
Rval = corrcoef(Outputval,Mnt22011);
R2val=Rval(1,2)^2
errval=(Outputval-Mnt22011');
for i=1:max(size(errval))
errval2(1,i)=errval(1,i)^2;
end
RMSEval= (mse(errval,Outputval))^0.5;
MSEval=(sum(errval2))/max(size(Mnt22011));
new=max(size(Mnt22011));
Mval=mean(Mnt22011);
Mquadrerrval=RMSEval/(max(size(Mnt22011))^(0.5));
EpcvMquad=100*Mquadrerrval/Mval
Maritmerrval=mean(abs(errval));
EpcvMaritm=100*Maritmerrval/Mval
EpcvRMSE=100*RMSEval/Mval
PeaksErrratioV=EpcvRMSE/EpcvMaritm_
Basically, at the moment it is just a basic code, where I have highlighted all the possible options available. The data I used for training it are data for the period 2000/2008, and then i use a separate validation set related to six months in 2011. In the last part I created some error indexes.
I have to say that I am very new to the NN world, therefore I still miss some theoretical concepts behind, probably. But, with trial and error, I have already tried to improve it in several ways (as you can see from the options I have highlighted) with bad results; only changing the training algorithm I have obtained fairy different results, but usually they are far different from training to training..I always got completely different charts. Would be enough to train it many times and to save the weights of the trial which gave the lowest error?
So, considering that in my validation set, but also in my training set, I have a very low performance (say err = +- 100%), and I don't think I can go to very low values (say +-5%) only with more data available, what can I do, froma network point of view, to improve it? I also tried a feedforward multilayer network with similar results, and a general regression model that gives me very flat results (see code below)
_ GRNN=newgrnn(Inputs7old',Mnt2old') ytr=GRNN(Inputs7old'); figure1=plot (ytr,'g') hold on figure1=plot(Mnt2old,'r') hold off Rtr=corrcoef(Mnt2old,ytr); msetr=mse(GRNN,Mnt2old,ytr');
errors = gsubtract(Mnt2old',ytr);
R2tr=Rtr(1,2)^2
for i=1:max(size(errors))
errors2(1,i)=errors(1,i)^2;
end
RMSEtr= (mse(errors,ytr))^0.5;
MSEtr=(sum(errors2))/max(size(Mnt2old));
old=max(size(Mnt2old));
Mtrain=mean(Mnt2old);
Mquadrerrtrain=RMSEtr*(max(size(Mnt2old))^(-0.5));
EpctRMSE=100*RMSEtr/Mtrain
EpctMquad=100*Mquadrerrtrain/Mtrain
Maritmerrt=mean(abs(errors));
EpctMaritm=100*Maritmerrt/Mtrain
PeaksErrRatioT=EpctRMSE/EpctMaritm
yval=GRNN(Inputs72011');
figure2=plot(yval,'b');
hold on
plot(Mnt22011,'g')
hold off
Rval = corrcoef(yval,Mnt22011);
R2val=Rval(1,2)^2
errval=(yval-Mnt22011');
for i=1:max(size(errval))
errval2(1,i)=errval(1,i)^2;
end
RMSEval= (mse(errval,yval))^0.5;
MSEval=(sum(errval2))/max(size(Mnt22011));
new=max(size(Mnt22011));
Mval=mean(Mnt22011);
Merrvalquadr=RMSEval/(max(size(Mnt22011))^(0.5));
EpcvMquadr=100*Merrvalquadr/Mval;
Merrvalaritm=mean(abs(errval));
EpcvMaritm=100*Merrvalaritm/Mval
EpcvRMSE=100*RMSEval/Mval
PeaksErrRatioV=EpcvRMSE/EpcvMaritm_
If anyone could help me I would be very grateful; thank you.

Accepted Answer

Greg Heath
Greg Heath on 6 Oct 2012
Edited: Greg Heath on 26 Oct 2016
You seem to be wasting a lot of time and space writing code that is unnecessary because you do not rely on defaults. My experience is that 1 hidden layer is sufficient. Given the size of the input and target matrices,( [ I N ] and [ O N ], respectively), the only things that have to be specified are
1. The number of input and/or feedback delays in time-series prediction.
2. The candidates for number of hidden nodes (e.g., H = 0:10)
3. The number of random weight initializations for each H candidate (e.g., Ntrials = 10).
4. A nonzero MSE training goal to mitigate overfitting. I favor
%net.trainParam.goal = 0.01*Ntrneq*var(T,0,2)/(Neq-Nw) ;
CORRECTION:
net.trainParam.goal = 0.01*max(Ndof,0)*var(T,0,2)/Ntrneq
where
Ntrneq = Ntrn*O % No. of training equations
Nw = (I+1)*H+(H+1)*O % No. of unknown weights (desire Nw << Ntrneq)
Ndof = Ntrneq - Nw % No. of degrees of freedom after estimating Nw weights
Search the NEWSGROUP and ANSWERS for code examples using the keywords
heath Ntrn Ntrneq Nw Ndof Ntrials
Hope this helps.
Greg

More Answers (4)

Edo
Edo on 6 Oct 2012
Edited: Edo on 6 Oct 2012
Thank you Greg, I will do some searching about those examples, and i will try to apply the modifications you suggest.
Yes you are right about the lenght of the code, most of it is just a list of options, which made me easy to switch from one to another just adding/canceling the "%".
My idea, talking about the inputs, was to provide to the network, one by one, all the variables I have, checking how the Mse decreases, and when it doesn't, I won't include that variable. This list would include also the lagged variables (t-1, t-2,.. t-n till the Mse doesn't decrease significantly. Would this avoid me to enter feedback delays in the network? And which network do you think it will be more suitable for a prediction problem like that? Feedforward of Fitnet as i already tried, or would you recommend me anything else?
Thanks again for the first reply, and I hope you can help me again.
  1 Comment
Greg Heath
Greg Heath on 7 Oct 2012
Backward search is typically better for ranking inputs. I have discussed this in a recent NEWSGROUP or ANSWERS post (as well as many others).
No need to ever use feedforwardnet Use fitnet for regression and curvefitting and patternnet for classification and pattern recognition.
For time-series prediction, use timedlaynet, narnet or narxnet. For your problem, narxnet fills the bill.
To estimate the number of feedback delays, plot the autocorrelation fuction of the output.
To estimate the number of input delays, plot the cross correlation functions of output and inputs.
Maybe more later ... the wife is dragging me to the gym.

Sign in to comment.


Edo
Edo on 8 Oct 2012
Edited: Edo on 8 Oct 2012
Lol good luck with the gym, Greg!
I am going through my narxnet and following your recommendations. Changing the feedback/input delays though i don't get any particular tip about the correct number of delays: both the autocorrelation function and the cross correlation one are very sensible to the initial weights, thus i get every time very different plots. In an average though, i usually get good results for the autocorrelation function (only some values beyond the threshold for lag = +-3 sometimes), while many more issues come from the cross correlation one, that most of the times has almost all the values quite similar and out of the bounds.
Do you think this is a problem related to poor data?
Also, a question about inputs ranking: it is not said, as it is a hydrological problem, that i would need info up to a fixed delay: I would need maybe only lag 1,3,6 for one variable and 1,2,4 for another one etc; once i estabilish which one i need (backward search), how can i put them into the same input matrix? narxnet makes you choose the same number of delays for all the rows of your input matrix, how can i assign different delays to different inputs?
By the way, during the ranking process, changing x(t) every time with different available variables, apparently narxnet got similar results; to me, it seems it relies only in y(t), while it doesn't take into account the info i give to him through x(t)..!!
Final issue: my prediction will have to be 2 time steps ahead. Narxnet apparently calculate y(t+1), once y(t) is given (with the early prediction option); how can I actually make it predict y(t+2)? If there is no solution, I would probably give as input x(t-1) instead of x(t) -->y(t+1)=y(t-1)+x(t-1) etc = 2 time steps.
I have also tried the Bayesian regularization, getting though overfitting despite a low mse.
Thanks again! You have been very helpful so far!

Greg Heath
Greg Heath on 9 Oct 2012
Neural Network for predictions. How to improve it
Asked by Edo on 5 Oct 2012 at 8:18
Latest activity Answered by Edo about 5 hours ago
Accepted Answer by Greg Heath
% Lol good luck with the gym, Greg! % % I am going through my narxnet and following your recommendations. % Changing the feedback/input delays though i don't get any particular tip % about the correct number of delays: both the autocorrelation function and % the cross correlation one are very sensible
sensitive?
%to the initial weights, thus i get every time very different plots.
You shouldn't.
I think I mislead you by using the term "outputs" instead of the correct term, "targets". SORRY ABOUT THAT! Let's try again:
BEFORE creating any nets:
1. Determine a good range for feedback delays by plotting and examining the relative positions of the peaks in the autocorrelation function (symmetric about 0 lag) of the target series, T. What, exactly, do you get for autocorrT(N:N+10)?
2. Determine a good range for input delays by plotting and examining the the relative positions of the peaks in the crosscorrelation function ( NOT symmetric about 0 lag) of the target series with EACH of the I input series X(i,:), i = 1:I. What, exactly, do you get for crosscorrXiT(N:N+10)?
% In an average though, i usually % get good results for the autocorrelation function (only some values % beyond the threshold for lag = +-3 sometimes), while many more issues % come from the cross correlation one, that most of the times has almost % all the values quite similar and out of the bounds.
???
If the bounds you are referring to are confidence intervals, the values outside the bounds are the ones that are significant.
% Do you think this is a problem related to poor data?
Probably not. You seem to be using the output of the net instead of the desired training target.
% Also, a question about inputs ranking: it is not said, as it is a % hydrological problem, that i would need info up to a fixed delay: I would % need maybe only lag 1,3,6 for one variable and 1,2,4 for another one etc;
Unfortunately, the timeseries NN functions require all input variable delays to be equal and, if here are multiple outputs, all output feedback delays to be equal.
% once i estabilish which one i need (backward search), how can i put them % into the same input matrix? narxnet makes you choose the same number of % delays for all the rows of your input matrix, how can i assign different % delays to different inputs?
You cannot.
% By the way, during the ranking process, changing x(t) every time with % different available variables, apparently narxnet got similar results; to % me, it seems it relies only in y(t), while it doesn't take into account % the info i give to him through x(t)..!!
This indicates that the target autocorrelation for positive delays is more significant than the input/target crosscorrelation for nonnegative delays. What, exactly, do you get for lags 0:10 (minmax(lags) = [-(N-1):(N-1)]for each function?
% Final issue: my prediction will have to be 2 time steps ahead. Narxnet % apparently calculate y(t+1), once y(t) is given (with the early % prediction option); how can I actually make it predict y(t+2)?
Use input delays [0:2] and output layer delays [1:2]
%If there is no solution,
There's always a solution. You mean if the solution is unsatisfactory ?
% I would probably give as input x(t-1) instead of x(t) % -->y(t+1)=y(t-1)+x(t-1) etc = 2 time steps.
No. You don't have to fool around like that anymore. See
help preparets
doc preparets
Include 0 in the input delay but not in the feedback delay
% I have also tried the Bayesian regularization, getting though overfitting % despite a low mse.
Why do you think you have an overfitting problem?
Is either
1. Ntrneq <~ 2*Nw
Ntrneq = prod(size(target(trainInd))) % Number of training equations
Nw = net.numWeightElements % Number of unknown weights
Or is
tr.vperf(end) and/or tr.tperf(end) >> tr.perf(end) ?
% Thanks again! You have been very helpful so far!
So far? Are you expecting things to start going downhill ?
Hope this helps.
Greg
  2 Comments
Edo
Edo on 10 Oct 2012
Edited: Edo on 10 Oct 2012
sensitive? yep sorry bad English
*BEFORE creating any nets:
1. Determine a good range for feedback delays by plotting and examining the relative positions of the peaks in the autocorrelation function (symmetric about 0 lag) of the target series, T. What, exactly, do you get for autocorrT(N:N+10)?*
I don't actually have that command in my Matlab, I have used xcorr; so plotting xcorr(Mn,'coeff'); i get a chart who clearly shows the seasonality of that variable: after the peak=1 at lag 0, there are peaks of 0.6 at around lag 52 and 104 (1 and 2 years), decreasing after that to 0.4 for the peaks at year 3 and 4 and so on. The peaks are pretty tiny, with the correlation values decreasing, for example, from 1 to 0.1 in 3-4 lags.
2. Determine a good range for input delays by plotting and examining the the relative positions of the peaks in the crosscorrelation function ( NOT symmetric about 0 lag) of the target series with EACH of the I input series X(i,:), i = 1:I. What, exactly, do you get for crosscorrXiT(N:N+10)?
Again with the same command xcorr, for example plotting xcorr(Mn,Mnhyp,'coeff'), which i know it's a quite importantly related input, i get a chart which has a peak of 0.6 at lag0, and then shows again an oscillatory path, which slightly higher values on the left side, reaching 0.55 at lags -52 and -104, and 0.5 at lag 52. Unfortunately though, for other variables i am sure they are related, i got poor correlation, and a less clear seasonal variation. The values are between 0.4 and 0.5 for their max (at lag 0), which i think it is very bad as i tried with a random vector and i got 0.4, and they don't show any particular relationship in time. I am sure for some of them there is a relationship, it is evident from the charts of the single inputs compared to the output, but i don't know why, it doesn't notice it.
This indicates that the target autocorrelation for positive delays is more significant than the input/target crosscorrelation for nonnegative delays. What, exactly, do you get for lags 0:10 (minmax(lags) = [-(N-1):(N-1)]for each function?
See above: as i wrote, there are no particular correlations for lags up to 10; correlations show up at lag 52, but obviously i can't run a program with 52 delays. It is a pity i can't choose the lags, as i was wondering in the previous message. So that's why, i guess, no matter which input i give to the network, the results will be based 99% on the previous output values. I am sure there must be a way to take into account all these (29!) additional variables I have..!
Use input delays [0:2] and output layer delays [1:2]
Are you sure about that? I tried with this code (here only first part): inputSeries = tonndata(Q,false,false); targetSeries = tonndata(Mn,false,false);
% Create a Nonlinear Autoregressive Network with External Input
inputDelays = 0:2;
feedbackDelays = 1:2;
hiddenLayerSize = 10;
net = narxnet(inputDelays,feedbackDelays,hiddenLayerSize);
and this error appear:
Error using removedelay (line 57) Removing 1 to input delays would result in a negative input weight delay.
Error in Narxnet2 (line 116) nets = removedelay(net);
so the early prediction one can't work; apparently it is the one which I have to use, because it already predicts one step ahead, in this way then it should be 2 steps ahead, but it doesn't work.
So far? Are you expecting things to start going downhill ?
No idea, I tried just for curiosity to put all the info I have together and give them to the net, but i got a very poor prediction; apparently the best one is without any inputs,just feeding back with the output. I need to find a solution..! But definitely your tips are essential to me, as I was very new to NN, and for this kind of very specific issues I would not find anybody able and keen on helping me. Thanks again.
ps: I don't know how to post my charts here, even the writing in this help looks difficult to me..! i wanted to show you the cross correlation ones. If you want to give me your email i can send it to you, otherwise really don't worry. Have a good day.
Greg Heath
Greg Heath on 10 Oct 2012
To get my email address,just click on the link 'Greg Heath' at the top of the post.
I just did some searching re determining signifiant lags. It appears that PARTIAL CORRELATION FUNCTIONS are the aprropriate tool. 'PARTIAL' takes into account lags that were accepted previously (analogous to stepwise regression?!).

Sign in to comment.


farzad
farzad on 21 Feb 2015
Hello All
For a case of mine , I was trying to use your code , but I did not succeed and I got the error of :
Error using network/train (line 272) Output data size does not match net.outputs{3}.size.
I have attached the input and the target file , but I used the first code you have put , do you think your code , the updated one after all , would work good also for my case ?
thank you all

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!