Asked by Jérôme
on 20 May 2013

I would like to share with you how I approached a nonlinear regression problem (2 inputs, one output), and have your advice.

After some quick readings I settled for a network with one single hidden layer with the tansig transfer function and purelin for the output, as it seems to be the most common approach for such problems.

I used trainbr in order to automatically determine the regularization parameter. However, I didn't find out how to automatically determine the number of hidden neurons (which should normally be possible in the Bayesian framework if I'm not msitaken). So I couldn't conflate the training and validation sets ; I kept the validation set to evaluate architectures of increasing amounts of neurons.

So within one for loop going from 1 to 20, I trained networks with 1 to 20 neurons in the hidden layer. Then, I applied them on the validation set and computed the mean squared error.

First question : is this the most appropriate way to do? Would you have done differently?

The MSE keeps getting smaller as the number of neurons increase. I stopped at 20 as there seems to have no real benefits in going further. Then, I applied the 2-20-1 net to the test set, and got a very very small MSE of 4e^-6, and a correlation of 0.99999 between the test labels and the output of the network.

Second question : isn't it suspicious to get such a high performance? What do you think about this?

I'll be looking forward to your responses in order to validate or dismiss my approach.

*No products are associated with this question.*

Answer by Greg Heath
on 20 May 2013

Accepted answer

TRAINBR does not use a validation set. Therefore I am not quite sure what you are doing.

Are you using TRAINBR's default 15% test subset as a holdout (NO validation stopping) validation subset for choosing the best of multiple designs?

Then , I assume you have a third holdout subset that you use for testing, i.e., to use the "best net" performance on the test set as an unbiased estimate of it's performance on non-design operational data.

My advice is to use the smallest number of hidden nodes, H, that will yield a degree-of-freedom-adjusted coefficient of determination exceeding 99%. Use a double loop with ~ 10 random weight initialization designs (inner loop) for each value of H (outer loop). Ten values of h should be sufficient.

I have posted many double loop designs in ANSWERS and NEWSGROUP.

If you have more questions, please include your code with comments.

Hope this helps.

**Thank you for formally accepting my answer**

Greg

Jérôme
on 21 May 2013

Indeed trainbr doesn't use a validation set, but I use one to determine the best number of hidden nodes.

Without trainbr, I would use three loops I guess : one loop iterating on values for the regularization ratio ; one loop within it iterating on a range of possible numbers of hidden nodes ; and one loop within it iterating, say, 10 times with different initializations of the parameters to avoid local minimum solutions.

But thanks to trainbr, the regularization parameter is found automatically, so I'm left with the two last loops. It seems to correspond to what you suggest. One question : for each H value, among the 10 models you try (one with different initializations), do you keep the best one or the worst one? Keeping the worst one (the one which had maybe the less luck at initialization) should be a more conservative approach, while keeping the best one (luckiest at init) a less conservative

I created myself the training, validation and test sets ; I give 100% of the training set for training using trainbr, and then I apply the result myself on the validation set to assess the performance and choose the best design. And at the end, I apply the best design on test set.

Greg Heath
on 31 May 2013

1. To get an unbiased estimate of performance on unseen data, use MSEtst from the net with the best MSEval.

2. To choose one net to use on future data, I would choose the net with the smallest number of hidden nodes that has an acceptable performance on all of the data.

Opportunities for recent engineering grads.

## 0 Comments