optimal hidden nodes number

Question

coqui on 11 Jan 2015

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/169573-optimal-hidden-nodes-number

Commented: coqui on 9 May 2015

Dear friend,

I want to determine the optimal number of hidden nodes using narnet in order to predict the next'day index, i have just a question:

I found two proposition about Hmax:

1) Hmax= Hub

or

2) Hmax=floor(Hub/10)% for example, but I have not understand how we can determine the number "10"

What is the difference between these two propositions and what's the right one.

Thanks

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Greg Heath on 13 Jan 2015

2
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/169573-optimal-hidden-nodes-number#answer_164793

Neither is always the right one. There are many ways to choose a value that works. I typically start searching with ~10 values in a range 1<= Hmin <= H <= Hmax <= Hub by trial and error. The upper bound Hub is chosen so that the number of training equations, Ntrneq, is not less than the number of unknown weights Nw. For robust designs it is desired that Hmax << Hub. That is where the empirical factor of 10 comes from. For each value in the range I usually design Ntrials = 10 candidates for a total of 100 designs. On rare occasions I have used Ntrials = 15 or 20.

I have explained this logic so many times it is ridiculous for me to say any more than search the NEWSGROUP and/or ANSWERS using any subset of the above variables. Usually

greg Hub Ntrials

is sufficient.

If there is not enough data to provide enough equations so that Ntrneq >> Nw, it is wise to use or combine an alternate approach like validation-set-stopping and/or regularization. I tend to use valstop. For the latter search on

 help msereg
 doc msereg
 help trainbr
 doc trainbr

There is recent evidence (Sorry, I lost the reference) that, for difficult designs, combining valstop and regularization can be very effective.

Hope this helps.

Thank you for formally accepting my answer

Greg

3 Comments
Show 1 older commentHide 1 older comment

Greg Heath on 14 Jan 2015

No. Typically, designs are chosen based on the validation data performance because the training data performance tends to be optimistically biased.

However, the training bias can be mitigated somewhat by taking into account the corresponding loss in degrees of freedom. Consequently, instead of dividing SSEtrn by Ntrneq = Ntrn*O, it is divided by the number of degrees of freedom (DOF) that results after the Nw weights are estimated

 Ntrndof = Ntrneq - Nw
 SSEtrn = sse(ttrn-ytrn)
 MSEtrn = SSEtrn/Ntrneq  % = mse(ttrn-ytrn)
 MSEtrna = SSEtrn/Ntrndof % = Ntrneq*MSEtrn/Ntrndof
 % a ==> 'a'djusted for the loss in DOF 
 %  DOFA ==> Degree-of-Freedom-adjusted

If you search in the NEWSGROUP and ANSWERS using greg and some of the above terms you will find many, many examples. For example, try subsets of

greg MSEtrna Ntrndof or Ndof

When searching to find the optimum number for H, I sometimes plot

MSEtrn, MSEtrna, MSEval and MSEtst vs H

The choice of H is based on minimizing MSEtrna or MSEval

Hope this helps.

Greg

P.S. if the training stops because MSEval goes through a minimum, obviously MSEval is also biased. However, I usually find this bias to be negligible. Nevertheless, if you have to be absolutely above board with client and/or research sponsor, use MSEtst which is UNBIASED and the legal prediction of performance on nontraining data. Summary statistics over multiple trials will yield performance summary statistics e.g., min, median, mean, stddev and max. Typically, I would only use the statistics of the top 10 to 30 designs.

coqui on 9 May 2015

Dear Greg,

I have decomposed the data into three parts: 70% (training), 10% (validation) and 20% (testing). When I used trial and error approch, I found the smallest MSE (0.53088525) of training with 15 hidden nodes but focusing on MSE of validation, the smallest MSE (0.27098756) was achieved with only one node!!!!!! it's makes sense???

we started with 1 hidden node and added one each time up to 20. trials=10.

Is 15 the optimal hidden neurone number????

Thanks a lot.

Sign in to comment.

optimal hidden nodes number

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments
Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

optimal hidden nodes number

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

3 Comments Show 1 older commentHide 1 older comment

More Answers (0)

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

3 Comments
Show 1 older commentHide 1 older comment