Neural network performance evaluation????

for evaluating NN performance for a given number of trail or retrain which approach is right and why?????
for trail=1:100
net=newff(....);
[net,tr,Y,E,Pf,Af] = train(...);
......;
end
OR
net=newff(....);
for trail=1:100
[net,tr,Y,E,Pf,Af] = train(...);
........;
end
Note: i am getting decent result for both approach; but the later giving me best result.

 Accepted Answer

Greg Heath
Greg Heath on 1 Jan 2013
Thank you for formally accepting my answer!

More Answers (1)

Greg Heath
Greg Heath on 27 Dec 2012
The first example is the correct one because it containss 100 random weight initializations. Therefore each net is a valid independent result.
The 2nd example just keeps training the same net more and more.
What, exactly, do you mean by decent results?
Is this regression or classification?
Are you using validation stopping?
How many acceptable solutions out of 100?
If regression, what are the means and standard deviations of the training, validation and testing NORMALIZED (with average target variance) mean-square-error?
I usually shoot for (but don't always get) NMSEtrn <= 0.01
For an I-H-O net
Ntrneq = prod(size(ttrn)) % Ntrn*O = No. of training equations
Nw = (I+1)*H +(H+1)*O % No. of unknown weights
NMSEtrn = sse(trn-ytrn)/(Ntrneq-Nw)/mean(var(ttrn',0))
NMSEi = mse(yi-ti)/mean(var(ti',1)) for i = val and test
I have posted many example in NEWSGROUP and ANSWERS. Try searching on
heath newff Ntrials
Hope this helps.
Thank you for formally accepting my answer.
Greg

8 Comments

Daud
Daud on 27 Dec 2012
Edited: Daud on 27 Dec 2012
The first example is the correct one because it containss 100 random weight initializations. Therefore each net is a valid independent result.
The 2nd example just keeps training the same net more and more.
****I have observed that in 2nd approach the weights are updating in each trail; so why this approach is wrong?. Plz explain.
What, exactly, do you mean by decent results?
*****i meant good recognition rate.
Is this regression or classification?
***classification.
Are you using validation stopping?
***yes.
How many acceptable solutions out of 100?
***i am trying to find the average recognition rate over 100 trail in val;train;test as well as overall samples.
If regression, what are the means and standard deviations of the training, validation and testing NORMALIZED (with average target variance) mean-square-error?
I usually shoot for (but don't always get) NMSEtrn <= 0.01
For an I-H-O net
Ntrneq = prod(size(ttrn)) % Ntrn*O = No. of training equations
Nw = (I+1)*H +(H+1)*O % No. of unknown weights
NMSEtrn = sse(trn-ytrn)/(Ntrneq-Nw)/mean(var(ttrn',0))
NMSEi = mse(yi-ti)/mean(var(ti',1)) for i = val and test
I have posted many example in NEWSGROUP and ANSWERS. Try searching on
heath newff Ntrials
Daud
Daud on 27 Dec 2012
Edited: Daud on 27 Dec 2012
i want you to notice the following matter: some fact and justification
1) Since the concept of NN is that there will be some initialized arbitrary weights and bias before training; and while training these values are updated in a such a way that the error is reduced.
2) Suppose after trail number one the the weights and bias are updated; now in the 2nd trail these updated weights and bias act as the initialized weights and bias for 2nd trail training.
3) i mean in approach number one i am initializing the net in each trail while in approach number two i am using the previous updated weights and bias as a new initialized value of weight and bias for the subsequent trail's training.
I have total respect to ur opinion; i may be wrong; i just want to clear my concept.
if i understand you correctly, yes.
In approach 1 you are training 100 nets and if parameters are chosen reasonably with RW data, most of the nets will be useful. For c mutually exclusive classes use targets with columns from the unit c-dimensional matrix eye(c). Store Nepochs, 4 NMSEs, and 4 Pcterrs in a results matrix with size [ 100 9 ] (or two with size [100 5 ] (.
Search the matrix for failed designs and delete those rows before calculating summary stats.
In approach 2 you are designing 1 net in 100 stages. If you look at the result tabulation, Nepochs will be mostly 1 and most of the results will be equal.
Thanks for ur precious comment; so ur suggestion is that 2nd approach is not convenient?
Greg i am still have strong believe that 2nd approach is also right;
i have done an experiment; for 100 trail
after trail 1 percentage of recognition is: 70%
after 2 :76%
after 3 : 80% ........ same until trail 40
at 41: 90%
............same (epochs are also same 5)
at 80: 99.554%
........same (epochs are also same 6)
at 100: 99.985%
so after 100 trail i got 99.985% recognition rate;
and i have seen that the weight and bias also changed in each trail.
Then why this is not convenient??
NO! The second approach is in general, useless!
The idea is to train a network that will GENERALIZE well; i.e., to have good performance on nontraining data. If you have enough unknown weights compared to the number of training equations, you can get ridiculously low error rates if you train long enough.
The problem is that the network will probably not generalize well. That is, will not perform well on nontraining data.
That is why the validation set is used to represent unseen nontraining data and ends training whenever the validation error increases max_fail epochs (default = 6) in a row.
You cannot ignore that fact and just keep training.
The true measure of a net is the test set error. The validation set error is a prediction of what the test set error will be. Therefore, when it reaches a minimum in training, you should stop.
Have you taken a good look at the trn/val/tst training performance plots?
The training mse tends to monotonically decrease when the val and/or ttst mses are increasing.
This phenomenon is called overtraining an overfit (too many weights) net.
Search overfitting in the comp.ai.neural-nets FAQ and elsewhere (e.g., my posts).
Again: Net performance is measured via performance on nondesign test sets:
data = design + test
design = train + validation.
I tend to use 10 trials (not trails!)for each candidate value of hidden nodes and choose successful nets with the smallest number of hidden nodes (& weights)
I have posted many, many designs. Key search words are heath Nw and Ntrials
Hope this helps.
Greg
Thanks for ur answer and correcting my spelling "Trial"; But still i can't incorporate the facts u mentioned; by the way the total data set is divided in to test;train and val sets. And the recognition rate mentioned above is the overall recognition rate(train; val and test).
Why should i concern about over-fitting since i am using validation stop?
Ok; Greg i want a query after the trial "1" in 2nd approach suppose the weights (initial) are w1;w2...wn. Now in second "2" trial Are the weights (initial) changed? or same as trial number "1" w1;w2...;wn.
If weights (initial) are same in each trial i am totally agree with u; but if not........ confused.
Greg i am posting my full code here: plz check it out
clc
close all
load Input_n
run target_00
order=randperm(size(input_all,2));
input_all=input_all(:,order);
Targets=Targets(:,order);
n_trial=100;
c_tr{1,n_trial}= [];cm_tr{1,n_trial}=[];ind_tr{1,n_trial}=[];per_tr{1,n_trial}=[];
c_ts{1,n_trial}= [];cm_ts{1,n_trial}=[];ind_ts{1,n_trial}=[];per_ts{1,n_trial}=[];
c_val{1,n_trial}= [];cm_val{1,n_trial}=[];ind_val{1,n_trial}=[];per_val{1,n_trial}=[];
%c_val_ovrl={1,100};cm_val_ovrl={1,100};ind_val_ovrl={1,100};per_val_ovrl={1,100};
c_ovrl{1,n_trial}= [];cm_ovrl{1,n_trial}= [];ind_ovrl{1,n_trial}= [];per_ovrl{1,n_trial}= [];
%c_tr_ovrl={1,100};cm_tr_ovrl={1,100};ind_tr_ovrl={1,100};per_tr_ovrl={1,100};
tr_info{1,n_trial} = [];
tr_net{1,n_trial}= [] ;
net.inputs{1}.processFcns = {'mapstd'};
net=newff(input_all,Targets,7,{'tansig','tansig'},'trainscg','learngdm','msereg');
%training parameters
net.trainParam.epochs=1000;
net.trainParam.goal=0;
net.trainParam.max_fail=6;
%Division parameters
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 20/100;
net.divideParam.testRatio = 10/100;
for i=1:100
close all
%net = init(net);
[net,tr,Y,E] = train(net,input_all,Targets);
info=trainscg('info');
tr_info{i} = tr;
tr_net{i} = net;
outputs_test=sim(net,input_all(:,tr.testInd));
outputs_val=sim(net,input_all(:,tr.valInd));
outputs_ovrl = sim(net,input_all);
%outputs=sim(net,input_test);
%[m,b,r] = postreg(outputs_test,Targets(:,tr.testInd))
%recog=sim(net,input_test);
%a=compet(recog)
[c_tr{i},cm_tr{i},ind_tr{i},per_tr{i}] = confusion(Targets(:,tr.trainInd),Y);
[c_val{i},cm_val{i},ind_val{i},per_val{i}] = confusion(Targets(:,tr.valInd),outputs_val);
[c_ts{i},cm_ts{i},ind_ts{i},per_ts{i}] = confusion(Targets(:,tr.testInd),outputs_test);
[c_ovrl{i},cm_ovrl{i},ind_ovrl{i},per_ovrl{i}] = confusion(Targets,outputs_ovrl);
%plotperf(tr)
%grid on
%plotconfusion(Targets(:,tr.trainInd),Y,'Training',Targets(:,tr.valInd),outputs_val,'Validation',Targets(:,tr.testInd),outputs_test,'Test')
%pause
end
%Result evaluation
Avg_recg_rt_ovrl = 100 - mean(cell2mat(c_ovrl));
Avg_recg_rt_tr = 100 - mean(cell2mat(c_tr));
Avg_recg_rt_ts = 100 - mean(cell2mat(c_ts));
Avg_recg_rt_val = 100 - mean(cell2mat(c_val));
[min_err trail_num] = min(cell2mat(c_ovrl));
best_recg = 100 - min_err;

Sign in to comment.

Categories

Find more on Deep Learning Toolbox in Help Center and File Exchange

Asked:

on 25 Dec 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!