There is an important difference between validation and testing.
In the ideal scenario:
There is only 1 subset to be used for both adequate training and unbiased estimation of performance on unseen data.The subset is assumed to be randomly drawn from a general population and is sufficiently large and diverse to represent the salient I/O features of that population so well that using the same data to test the performance of a net that was trained with it results in a relatively unbiased estimate of performance on unseen data from the same general population.
Typically, however, this requires more data than is available and the scenario has to be modified so that a good network can be designed and a relatively unbiased estimate of network performance on nondesign data can be obtained..
A common approach is based on a division of the data into 3 separate subsets for training, validation and testing.
Data = Design + Test
Design = Train + Val
The 3 subsets are all assumed to be sufficiently large random draws from the general population. Accurate weight estimation generally results in the training set being at least 63% (1-1/exp(1)) of the total. The validation and test sets are generally of similar size. (THE MTLB default is (0.7/0.15/0.15))
The test set is used ONCE AND ONLY ONCE to estimate network performance on population nondesign data. If this is unsatisfactory, redesigns require re-randomization of the data using a different initial RNG state.
The training set is used to estimate weights given a set of training parameters.
The training and validation sets are used repetively to both estimate weights (training set) and determine an adequate set of training parameters (validation set). Very often a set of default values are used for number of hidden nodes, learning rate, momentum constant, maximm number of epochs, etc. and the emphasis is on determining when to stop training.
This is called validation stopping (AKA "stopped training" and "early stopping" ). The validation error is used to stop training when it fails to decrease for 6 (MTLB default) consecutive epochs.
In other words, the training subset is used to obtain a set of weights that EITHER minimizes the validation error, minmizes the training error or acheives a specified training error goal.
The training error is always biased.
If the validation error causes training to stop, or is used repetively to determine training parameters, the validation error is also biased (although usually not nearly as much as the training error).
The test set error is unbiased because it is completely independent of design (training and validation).
The test set error is used to estimate performance on the rest of the unseen data in the population.
When the data is not sufficiently large to acheive reasonable sizes for the 3 subsets, I suggest first using all of the data, without division, to design ~100 networks that differ by number of hidden nodes (e.g., 0:9) and 10 different random weight initializations. This typically takes less than a minute or two. Use the 10x10 performance tabulations to guide further designs.
I have posted several examples in the NEWSGROUP and ANSWERS.
Regarding your proposal: Random data division and weight initialization are defaults in MTLB feedforward nets. Therefore all you have to do is set
1. The initial RNG state (e.g., rng(0)) so you can duplicate experiments
2. The range of hidden nodes (e.g., j = 1:10)
3.The number of weight initialization trials (e.g., i = 1:10)
4. The data division ratio
5. The MSEgoal
and use a double loop over H and weight intialization to obtain the 3 error estimates from the training record, tr, and store them in 10x10 matrices.
From the matrices you can easily see the smallest number of hidden nodes that yields good performances most of the time.
You can either average over those good performances or make more runs for that particular value of hidden nodes to get a final estimate of mean error and standard deviation.
Hope this helps.
Thank you for formally accepting my answer.