optimal hidden nodes number

3 views (last 30 days)
coqui
coqui on 11 Jan 2015
Commented: coqui on 9 May 2015
Dear friend,
I want to determine the optimal number of hidden nodes using narnet in order to predict the next'day index, i have just a question:
I found two proposition about Hmax:
1) Hmax= Hub
or
2) Hmax=floor(Hub/10)% for example, but I have not understand how we can determine the number "10"
What is the difference between these two propositions and what's the right one.
Thanks

Accepted Answer

Greg Heath
Greg Heath on 13 Jan 2015
Neither is always the right one. There are many ways to choose a value that works. I typically start searching with ~10 values in a range 1<= Hmin <= H <= Hmax <= Hub by trial and error. The upper bound Hub is chosen so that the number of training equations, Ntrneq, is not less than the number of unknown weights Nw. For robust designs it is desired that Hmax << Hub. That is where the empirical factor of 10 comes from. For each value in the range I usually design Ntrials = 10 candidates for a total of 100 designs. On rare occasions I have used Ntrials = 15 or 20.
I have explained this logic so many times it is ridiculous for me to say any more than search the NEWSGROUP and/or ANSWERS using any subset of the above variables. Usually
greg Hub Ntrials
is sufficient.
If there is not enough data to provide enough equations so that Ntrneq >> Nw, it is wise to use or combine an alternate approach like validation-set-stopping and/or regularization. I tend to use valstop. For the latter search on
help msereg
doc msereg
help trainbr
doc trainbr
There is recent evidence (Sorry, I lost the reference) that, for difficult designs, combining valstop and regularization can be very effective.
Hope this helps.
Thank you for formally accepting my answer
Greg
  3 Comments
Greg Heath
Greg Heath on 14 Jan 2015
No. Typically, designs are chosen based on the validation data performance because the training data performance tends to be optimistically biased.
However, the training bias can be mitigated somewhat by taking into account the corresponding loss in degrees of freedom. Consequently, instead of dividing SSEtrn by Ntrneq = Ntrn*O, it is divided by the number of degrees of freedom (DOF) that results after the Nw weights are estimated
Ntrndof = Ntrneq - Nw
SSEtrn = sse(ttrn-ytrn)
MSEtrn = SSEtrn/Ntrneq % = mse(ttrn-ytrn)
MSEtrna = SSEtrn/Ntrndof % = Ntrneq*MSEtrn/Ntrndof
% a ==> 'a'djusted for the loss in DOF
% DOFA ==> Degree-of-Freedom-adjusted
If you search in the NEWSGROUP and ANSWERS using greg and some of the above terms you will find many, many examples. For example, try subsets of
greg MSEtrna Ntrndof or Ndof
When searching to find the optimum number for H, I sometimes plot
MSEtrn, MSEtrna, MSEval and MSEtst vs H
The choice of H is based on minimizing MSEtrna or MSEval
Hope this helps.
Greg
P.S. if the training stops because MSEval goes through a minimum, obviously MSEval is also biased. However, I usually find this bias to be negligible. Nevertheless, if you have to be absolutely above board with client and/or research sponsor, use MSEtst which is UNBIASED and the legal prediction of performance on nontraining data. Summary statistics over multiple trials will yield performance summary statistics e.g., min, median, mean, stddev and max. Typically, I would only use the statistics of the top 10 to 30 designs.
coqui
coqui on 9 May 2015
Dear Greg,
I have decomposed the data into three parts: 70% (training), 10% (validation) and 20% (testing). When I used trial and error approch, I found the smallest MSE (0.53088525) of training with 15 hidden nodes but focusing on MSE of validation, the smallest MSE (0.27098756) was achieved with only one node!!!!!! it's makes sense???
we started with 1 hidden node and added one each time up to 20. trials=10.
Is 15 the optimal hidden neurone number????
Thanks a lot.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!