NN validation and data partition

5 views (last 30 days)
laplace laplace on 14 Jan 2013
• 1)i got 50 data i use 35 to train the network 15* to test
• 2)i take these 35 data and split them into 7 folds of 5 data each
• i=1,..,n=7 i pick an i(5 data) to test/validate and the rest (30 data) to train the network
• 3) so now i have created 8 networks: the original and onother 7 due to the data partition i made
• 3a)i wanna save these 7 networks and be able to manipulate them
• 3b)i want to take the orignal 15 data* and run them through the 7 networks as test/validate data

Greg Heath on 16 Jan 2013
There is an important difference between validation and testing.
In the ideal scenario:
There is only 1 subset to be used for both adequate training and unbiased estimation of performance on unseen data.The subset is assumed to be randomly drawn from a general population and is sufficiently large and diverse to represent the salient I/O features of that population so well that using the same data to test the performance of a net that was trained with it results in a relatively unbiased estimate of performance on unseen data from the same general population.
Typically, however, this requires more data than is available and the scenario has to be modified so that a good network can be designed and a relatively unbiased estimate of network performance on nondesign data can be obtained..
A common approach is based on a division of the data into 3 separate subsets for training, validation and testing.
Data = Design + Test
Design = Train + Val
The 3 subsets are all assumed to be sufficiently large random draws from the general population. Accurate weight estimation generally results in the training set being at least 63% (1-1/exp(1)) of the total. The validation and test sets are generally of similar size. (THE MTLB default is (0.7/0.15/0.15))
The test set is used ONCE AND ONLY ONCE to estimate network performance on population nondesign data. If this is unsatisfactory, redesigns require re-randomization of the data using a different initial RNG state.
The training set is used to estimate weights given a set of training parameters.
The training and validation sets are used repetively to both estimate weights (training set) and determine an adequate set of training parameters (validation set). Very often a set of default values are used for number of hidden nodes, learning rate, momentum constant, maximm number of epochs, etc. and the emphasis is on determining when to stop training.
This is called validation stopping (AKA "stopped training" and "early stopping" ). The validation error is used to stop training when it fails to decrease for 6 (MTLB default) consecutive epochs.
In other words, the training subset is used to obtain a set of weights that EITHER minimizes the validation error, minmizes the training error or acheives a specified training error goal.
The training error is always biased.
If the validation error causes training to stop, or is used repetively to determine training parameters, the validation error is also biased (although usually not nearly as much as the training error).
The test set error is unbiased because it is completely independent of design (training and validation).
The test set error is used to estimate performance on the rest of the unseen data in the population.
When the data is not sufficiently large to acheive reasonable sizes for the 3 subsets, I suggest first using all of the data, without division, to design ~100 networks that differ by number of hidden nodes (e.g., 0:9) and 10 different random weight initializations. This typically takes less than a minute or two. Use the 10x10 performance tabulations to guide further designs.
I have posted several examples in the NEWSGROUP and ANSWERS.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Regarding your proposal: Random data division and weight initialization are defaults in MTLB feedforward nets. Therefore all you have to do is set
1. The initial RNG state (e.g., rng(0)) so you can duplicate experiments
2. The range of hidden nodes (e.g., j = 1:10)
3.The number of weight initialization trials (e.g., i = 1:10)
4. The data division ratio
5. The MSEgoal
and use a double loop over H and weight intialization to obtain the 3 error estimates from the training record, tr, and store them in 10x10 matrices.
From the matrices you can easily see the smallest number of hidden nodes that yields good performances most of the time.
You can either average over those good performances or make more runs for that particular value of hidden nodes to get a final estimate of mean error and standard deviation.
Hope this helps.
Thank you for formally accepting my answer.
Greg
Greg Heath on 19 Feb 2013
Will send two draft outlines shortly:
1. Estimating the optimum No. of hidden nodes, Hopt, using ALL of the data
2. Estimating the generalization error on unseen data using Hopt and 7-fold XVAL.
What are dimensionalities of input & output? (I and O)
Have you plotted and surveyed the data?
What is the average variance of the target variables?
MSE00 = mean(var(t',1))% biased (N-1 divisor )
MSE00a = mean(var(t')) % unbiased (N divisor )
Please respond with those 4 numbers ASAP.
Thanks
Greg

Greg Heath on 19 Feb 2013
1 Choose candidate values for Hopt, the optimal No. of hidden nodes by using ALL of the data and looping over Ntrials candidates values for each H in Hmin:dH:Hmax.
Neq = N*O % No. of training equations
Nw = (I+1)*H+(H+1)*O % No.of unknowns (weights)
Ndof = Neq - Nw % No. of estimation degrees-of-freedom
MSE = SSE/Neq % Biased MSE estimate
MSEa = SSE/Ndof %Unbiased ("a"djusted) MSE estimate
Hub = -1 + ceil( (Neq-O)/(I+O+1)) % Ndof > 0 (Neq >Nw) upper bound
Choose Hmin,dH,Hmax <= Hub
numH = length(Hmin:dH:Hmax)
Ntrials = 10 % No.of weight initializations per H value
2. Use ALL of the data for training to choose Hopt from Ntrials*numH candidate nets
a. rng(0) % Initialize random number generator
b. Outer Loop over h = Hmin:dH:Hmax (j=1:numH)
c. Inner loop over i = 1:Ntrials % of weight initializations
d. net.divideFcn = 'dividetrain'
e. MSEgoal = 0.01*Ndof*MSE00a/Neq
f. net.trainParam.goal = MSEgoal;
g. net = fitnet(h);
h. [net tr] = train(net,x,t);
i. Nepochs(i,j)= tr.best_epoch
j. MSE = tr.best_perf
k. NMSE = MSE/MSE00
l. R2(i,j) = 1-NMSE % Biased
m. MSEa = Neq*MSE/Ndof
n. NMSEa = MSEa/MSE00a
o. R2a(i,j) = 1-NMSEa % Unbiased
3. Estimating Hopt
a.Tabulate Nepochs, R2, and R2a in 3 Ntrials-by-numH matrices.
b.Choose (i,j)_opt and Hopt from maximum of R2a.
c.Redesign netopt by reinitializing RNG and calling it repeatedly to get
the same initial state as the (i,j)_opt run.
NOTE: Can obtain (i,j)opt, Hopt and net_opt within the loop. However, perusing the 3 tabulations is enlightening.

laplace laplace on 19 Feb 2013
data: inputs 1x4 columns; outputs 0 or 1
laplace laplace on 9 Mar 2013
answer me plz why this makes no sense?

laplace laplace on 17 Apr 2013
you stil l didnt answer me
Greg Heath on 24 Apr 2013
Sorry I didn't see this until now.
You said
data: inputs 1x4 columns; outputs 0 or 1
whereas your input matrix has dimensions 4x50 containing 4x1 vectors.