I would like to share with you how I approached a nonlinear regression problem (2 inputs, one output), and have your advice.
After some quick readings I settled for a network with one single hidden layer with the tansig transfer function and purelin for the output, as it seems to be the most common approach for such problems.
I used trainbr in order to automatically determine the regularization parameter. However, I didn't find out how to automatically determine the number of hidden neurons (which should normally be possible in the Bayesian framework if I'm not msitaken). So I couldn't conflate the training and validation sets ; I kept the validation set to evaluate architectures of increasing amounts of neurons.
So within one for loop going from 1 to 20, I trained networks with 1 to 20 neurons in the hidden layer. Then, I applied them on the validation set and computed the mean squared error.
First question : is this the most appropriate way to do? Would you have done differently?
The MSE keeps getting smaller as the number of neurons increase. I stopped at 20 as there seems to have no real benefits in going further. Then, I applied the 2-20-1 net to the test set, and got a very very small MSE of 4e^-6, and a correlation of 0.99999 between the test labels and the output of the network.
Second question : isn't it suspicious to get such a high performance? What do you think about this?
I'll be looking forward to your responses in order to validate or dismiss my approach.
TRAINBR does not use a validation set. Therefore I am not quite sure what you are doing.
Are you using TRAINBR's default 15% test subset as a holdout (NO validation stopping) validation subset for choosing the best of multiple designs?
Then , I assume you have a third holdout subset that you use for testing, i.e., to use the "best net" performance on the test set as an unbiased estimate of it's performance on non-design operational data.
My advice is to use the smallest number of hidden nodes, H, that will yield a degree-of-freedom-adjusted coefficient of determination exceeding 99%. Use a double loop with ~ 10 random weight initialization designs (inner loop) for each value of H (outer loop). Ten values of h should be sufficient.
I have posted many double loop designs in ANSWERS and NEWSGROUP.
If you have more questions, please include your code with comments.
Hope this helps.
Thank you for formally accepting my answer