http://www.mathworks.com/matlabcentral/newsreader/view_thread/328393
MATLAB Central Newsreader  Error in Selection of Optimum Parameters in Neural Network
Feed for thread: Error in Selection of Optimum Parameters in Neural Network
enus
©19942014 by MathWorks, Inc.
webmaster@mathworks.com
MATLAB Central Newsreader
http://blogs.law.harvard.edu/tech/rss
60
MathWorks
http://www.mathworks.com/images/membrane_icon.gif

Wed, 17 Apr 2013 20:54:08 +0000
Re: Error in Selection of Optimum Parameters in Neural Network
http://www.mathworks.com/matlabcentral/newsreader/view_thread/328393#902633
Greg Heath
"Subodh Paudel" <subodhpaudel@gmail.com> wrote in message <kklvbo$o7o$1@newscl01ah.mathworks.com>...<br>
> Hello All,<br>
> I have an error in selection of optimum configuration from the neural network model. I consist R2 statisticics for training, R2 statistics for training under degree of freedom adjustment and i choose the highest (R2 statistics for training under degree of freedom adjustment + R2 for Validation:). <br>
> <br>
> Which is the best condition here:<br>
> Case Neurons R2Train R2TrainDOF R2Val R2Tst MSETrain MSEval R2TrainDOF+R2Val<br>
> 1 9 0.8901 0.8832 0.8799 0.751 0.109 0.119 1.7632<br>
> 2 16 0.8906 0.8777 0.8871 0.785 0.109 0.111 1.7646<br>
> 3 19 0.9005 0.8864 0.8641 0.768 0.099 0.1347 1.7505<br>
> <br>
> In three case, which one is the best?<br>
<br>
You omitted <br>
<br>
1. Ntrn/Nval/Ntst. Confidence bounds on estimates vary inversely with size<br>
2. dividerand or ...? Summary statistics of trn/val/tst should not be significantly <br>
different <br>
3. tr.stop (Was training terminated because of min(MSEval)? If not, then <br>
I would tend to treat MSEval as an unbiased estimate of generalization error.<br>
4. Whether multiple weight initialization trials were run for each case and <br>
the tabulations were the best of the trials.<br>
<br>
Simplify:<br>
H R2trn R2trna R2Val R2Tst MSEtrn MSEval R2trna+R2Val<br>
9 0.89 0.88 0.88 0.75 0.11 0.12 1.76<br>
16 0.89 0.88 0.89 0.79 0.11 0.11 1.76<br>
19 0.90 0.89 0.86 0.77 0.10 0.13 1.75<br>
<br>
Typically, MSE error is assumed to be Chisquare distributed. See Wikipedia for the estimate of the stdv or quantiles.<br>
<br>
I 've forgotten the equation for the Chisquare stdv but my impression is that the differences in MSEtrn are statistically insignificant. Therefore, I would be inclined to <br>
choose 1 since Nw is the smallest. You didn't list MSEtrna but comparing R2trna values reinforces my choice. Ditto for MSEval.<br>
<br>
Interesting you tabulated R2trna+R2val; Recently I have noticed that, when the <br>
effects of overtraining an overfit net are negligible, R2trn+R2trna and R2val+R2tst <br>
are comparable. For this data you have<br>
<br>
H (R2trn + R2trna)/2 (R2val +R2tst)/2 (R2val + R2trna)/2<br>
9 0.89 0.82 0.88<br>
16 0.88 0.84 0.88<br>
19 0.89 0.82 0.88<br>
<br>
Which may indicate that tst is not a typical random draw.<br>
<br>
Now once the choice is made via , trna & val, let's see what tst reveals: <br>
<br>
1. Cannot tell if d(MSEtst) is significant because I don't know Ntst.<br>
2. MSEtst seems to be significantly higher than MSEval, However, <br>
don't know tr.stop, Ntst and Nval. This may indicate that there <br>
could be a statistical difference between tst vs trn & val. Are you using<br>
dividerand?<br>
3. To be more confident, I would rerun H = 9 and 16 with <br>
Ntrials ~ 10, different random trn/val/tst divisions AND initial weights.<br>
<br>
> 1) In my understanding, case 2 shows better results since R2 statics for training under biased i.e. degree of freedom adjustment plus R2 for validation is greater than rest case. But there is also contradiction that R2val (0.8871) is greater than R2 training under biased condition (0.8777), which should not exist. <br>
<br>
Invalid conclusion. You have to take error bars into consideration. Even then, 5% of <br>
the time values are outside of the 95% confidence interval.<br>
<br>
>So, my optimum choice is case 1 which does not violate any rule.<br>
<br>
My optimum choice is case 1 because it does not seem to be significantly worse than the others AND it has fewer weights.<br>
<br>
> Please, any one could suggest me. Further, as Prof. Greg Heath mentioned about Degree of Freedom adustment for (Hmax and Training equation... Hub=1+ceil((NtrneqO)/(I+O+1));<br>
> Hmax=round(Hub/8);), why this is done to avoid overfitting problems or what?<br>
<br>
Yes. The documentation default always uses H=10 and relies on val stopping. I <br>
would rather use the minimum acceptable Nw, using multiple (numH*Ntrials) designs <br>
rather than a formal search over discrete values of H. <br>
<br>
See Wikipedia about "estimation degrees of freedom" <br>
<br>
> Please, could you send the references so that i could site this equation for the conference paper which i am sending?<br>
<br>
Search Newsgroup and Answers. I don't think I used it in comp.ai.neuralnets, <br>
but you can check. If so, I probably used Hmax instead of Hub. The important <br>
point is that Nw > Ntrneq as H> Hub and overtraining mitigation has to be <br>
considered.<br>
<br>
Hope this helps.<br>
<br>
Greg

Thu, 18 Apr 2013 09:58:08 +0000
Re: Re: Error in Selection of Optimum Parameters in Neural Network
http://www.mathworks.com/matlabcentral/newsreader/view_thread/328393#902670
Subodh Paudel
Thank you very much for the previous mail but i am interested how Ntrn, Nval, Ntst, tr.stop, division of data gives the choice on the approximate configuration and i would be greateful to know how these parameters effect the best configuration. The target of this model is for short forecast.<br>
<br>
Here is more information:<br>
1) Ntrn = 1824 (Januar 28  February 16, 15 minutes sampling interval of data )<br>
Nval =384 (Feb 17  Feb 20)<br>
Ntst = 384 (Feb 21  Feb 24)<br>
2) Data are used periodic but use static neural network (not periodic for dynamic neural network)<br>
3) tr.stop is based on validation failures = 40, MSE goal = 0.01*Ndof*MSEtrn00a)/Ntrneq, and other default MATLAB values.<br>
<br>
4) Five random weight initialization trails were used with rng (0).<br>
<br>
The simplified configuration of model with addition MSEtrna are as (after the best configuration from each trials):<br>
<br>
H R2trn R2trna R2val R2tst MSEtrn MSEtrna MSEval (R2trna+R2val)/2<br>
9 0.89 0.88 0.88 0.75 0.11 0.12 0.12 0.88<br>
16 0.89 0.88 0.89 0.79 0.11 0.12 0.11 0.88<br>
19 0.90 0.89 0.86 0.77 0.10 0.11 0.13 0.88<br>
<br>
1) Which is the best configration from this 3 cases? (For me 2nd case since R2trn increase means we are increasing te larning rate and R2val also increases to 0.89)<br>
2) How to define the objective function to know the best configuration on different hidden neurons size (to obtain the results by programming)? <br>
Objective function = a*H + b*(R2trna+R2val)/2<br>
a = 0.5, b=0.5 (Equal prefernces of weights to hidden neurons and R2trna + R2val)<br>
Is there other idea to define?<br>
<br>
3) Similarly, how to define objective function to know the best configuration on different randomization trials from same hidden neurons size?<br>
<br>
4) In any cases, i could not increase the R2trn more than 0.90. Even i use all the data during the training process. Is there any methods that increase the R2trn process? <br>
<br>
Thank you very much for your answer in advance.<br>
<br>

Fri, 19 Apr 2013 02:08:09 +0000
Re: Re: Error in Selection of Optimum Parameters in Neural Network
http://www.mathworks.com/matlabcentral/newsreader/view_thread/328393#902741
Greg Heath
"Subodh Paudel" <subodhpaudel@gmail.com> wrote in message <br>
<kkog3g$g72$1@newscl01ah.mathworks.com>...<br>
> Thank you very much for the previous mail but i am interested how <br>
Ntrn, Nval, Ntst, tr.stop, division of data gives the choice on the <br>
approximate configuration and i would be greateful to know how these <br>
parameters effect the best configuration. The target of this model is for <br>
short forecast.<br>
> <br>
> Here is more information:<br>
> 1) Ntrn = 1824 (Januar 28  February 16, 15 minutes sampling <br>
interval of data )<br>
> Nval =384 (Feb 17  Feb 20)<br>
> Ntst = 384 (Feb 21  Feb 24)<br>
<br>
>> N = 1824+2*384, tstratio = 384/N<br>
<br>
N = 2592<br>
tstratio = 0.1481<br>
<br>
> 2) Data are used periodic but use static neural network (not periodic <br>
for dynamic neural network)<br>
<br>
Still should know the statistically significant autocorrelation lags. Otherwise <br>
you are just crossing your fingers and whistling in the dark.<br>
<br>
> 3) tr.stop is based on validation failures = 40, <br>
<br>
40?? ... Might as well use inf. If you don't want validation stopping, <br>
don't waste data with a validation set.<br>
<br>
> MSEgoal = 0.01*Ndof*MSEtrn00a/Ntrneq, <br>
<br>
loses credibility as Ndof > 0. Useless for Ndof < 0. Credibility threshold <br>
depends on the data. Would be nice to have a smooth transition to <br>
something more credible. For now, try<br>
<br>
MSEgoal = max(0, 0.01*Ndof*MSEtrn00a/Ntrneq), <br>
<br>
and rely on val stopping with the default max_fail.<br>
<br>
>and other default MATLAB values.<br>
> <br>
> 4) Five random weight initialization trails were used with rng (0).<br>
<br>
Ntrials >= 10 is more convincing (Statisticians love Ntrials >= 30! <br>
... eye R jest 'n injunere.)<br>
<br>
> The simplified configuration of model with addition MSEtrna are as <br>
(after the best configuration from each trials):<br>
> <br>
> H R2trn R2trna R2val R2tst MSEtrn MSEtrna MSEval (R2trna+R2val)/2<br>
> 9 0.89 0.88 0.88 0.75 0.11 0.12 0.12 0.88<br>
> 16 0.89 0.88 0.89 0.79 0.11 0.12 0.11 0.88<br>
> 19 0.90 0.89 0.86 0.77 0.10 0.11 0.13 0.88<br>
> <br>
> 1) Which is the best configration from this 3 cases? (For me 2nd case since <br>
R2trn increase means we are increasing te larning rate and R2val also <br>
increases to 0.89)<br>
<br>
You did not use the Chisquare estimate of standard deviation to show that<br>
these differences are even significant. My GUESS is that the test set result <br>
is significantly different from the rest of the data. That seems suspicious if <br>
the data were randomly divided AND the initial weights were randomly <br>
assigned for each of the 15 cases.<br>
<br>
Double check to make sure you have 15 random data divisions as well as <br>
15 random weight initializations.<br>
<br>
Also, try Ntrials >= 10 (I don't even trust Ntrials = 5 for the XOR example)<br>
<br>
> 2) How to define the objective function to know the best configuration on <br>
different hidden neurons size (to obtain the results by programming)? <br>
> Objective function = a*H + b*(R2trna+R2val)/2<br>
> a = 0.5, b=0.5 (Equal prefernces of weights to hidden neurons and <br>
R2trna + R2val)<br>
> Is there other idea to define?<br>
<br>
1. Use the obsolete MSEREG<br>
2. Use the regularization option in MSE<br>
3. Use the the Bayesian Regularization function TRAINBR<br>
<br>
Use the commands help, doc and type to understand their differences.<br>
<br>
> 3) Similarly, how to define objective function to know the best configuration <br>
on different randomization trials from same hidden neurons size?<br>
<br>
In the last 30 years I have only used 2 objective functions: MSE and, on <br>
occasion, PctErr for RBF classification. If Ntst is not large enough to give you <br>
the confidence levels you desire for generalization estimation, use 10fold <br>
crossvalidation. Unfortunately, that is not a NNTBX option. You could try to <br>
use the STATTBX functions, cvpartition and crossval (I'm not familiar with them) <br>
write your own, or just intelligently use large values of Ntrials.<br>
<br>
Remember, this is statistics. The goal is to obtain a sufficiently accurate model <br>
to perform on nontraining data. Chasing optimality on a finite set of design <br>
data (epecially using a hand crafted objective function) is not recommended.<br>
<br>
> 4) In any cases, i could not increase the R2trn more than 0.90. Even i use <br>
all the data during the training process. Is there any methods that increase <br>
the R2trn process? <br>
<br>
Yes. Go back to your data and start with the basics:<br>
<br>
1. What confidence do you have that trn/val/tst data have the same <br>
summary statistics?<br>
2. What confidence do you have that your inputs are well correlated with <br>
your targets?<br>
3. Use the correlation knowledge to choose a reasonable set of inputs and <br>
Plot R2trn and R2trna vs H for 0<=H<=Hub with Ntrn = N.<br>
4. If H << Hub, R2trna may be an acceptable estimate of generalization performance. <br>
If not, use data division and a reasonably large value of Ntrials and numH.<br>
<br>
Hope this helps.<br>
<br>
Greg<br>
<br>
P.S. ... or you may want to go the regularization route.<br>