Path: news.mathworks.com!not-for-mail
From: "zaheer ahmad" <ahmad.zaheer@yah00000.com>
Newsgroups: comp.soft-sys.matlab
Subject: Re: MLP Optimization Problem ( generalization problem ) -- on OCR
Date: Sat, 6 Dec 2008 20:54:02 +0000 (UTC)
Organization: IMS
Lines: 106
Message-ID: <gheopa$3sq$1@fred.mathworks.com>
References: <ggpan0$s2d$1@fred.mathworks.com> <e9dc5bcb-cf3d-4a6f-9fdd-d4ea0b8ee6a9@d23g2000yqc.googlegroups.com>
Reply-To: "zaheer ahmad" <ahmad.zaheer@yah00000.com>
NNTP-Posting-Host: webapp-03-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1228596842 3994 172.30.248.38 (6 Dec 2008 20:54:02 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Sat, 6 Dec 2008 20:54:02 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 706613
Xref: news.mathworks.com comp.soft-sys.matlab:505394


Greg Heath <heath@alumni.brown.edu> wrote in message <e9dc5bcb-cf3d-4a6f-9fdd-d4ea0b8ee6a9@d23g2000yqc.googlegroups.com>...
> On Nov 28, 12:45 pm, "zaheer ahmad" <ahmad.zah...@yah00000.com> wrote:
> > Dear All
> >
> > I am developing an OCR (Urdu) but having 'goal doesnt meet' problem.
> >
> > my network is
> >
> > Input =400 also reduced and checked on 100 and 144
> > output=54
> > Hidden layer = 20 but checked on 30,40,50,60,70,80,90,100 upto 250
> >
> > Sample size i tried to train th net are
> > 5400   (i.e. 100*54=5400) but also checked on
> > 540    (i.e. 10*54=540) and
> > 1080   (i.e. 20*54=1080) and
> > 1350   (i.e. 25*54=1350) and
> > 2700   (i.e. 50*54=2700)
> > where 54 are the number of character and 100,10,20,25 and 50 are samples of each character
> 
> So you have
> 
> size(p) = [400 Ntrn] for a character with 20*20 = 400 pixels
> size(t)  = [54 Ntrn] for 54 letters, integers and special characters?

yes i have 400 Ntrn and 54 letter...all urdu character special character not considered.


> How similar are testing and training sets?
testing and training data are sample from  the same population
> Clustering and visualizing the data should help.
kindly help or reference on clustering......

> if Q ~= Qa, error, end
Yes Q=Qa
> > % DEFINING THE NETWORK

> Why not standardize inputs and use tansig hidden nodes??
i have checked both and now ammeded the first to tansig and left the 2nd as it was as i need that to compare with ascii.

> H = S1?
yes it is .

>
> > netn.trainParam.goal =0.01;% mean(var(Target))/100; %0.009;%mean(var(Target))/100; % Mean-squared error goal.
> 
> Revisit this.

i am using this as suggested by you.

> Use only one noise level and scale it to the
> standard deviation of Alphabet in order to get
> a specified SNR.
i dont understand or say dont know how to do....

> You can also replace forced classification (always
> make a classification) with conditional classification
> (only make a classification if the posterior estimate
> is larger than a threshold). To do this, overlay the
> color coded histograms of the output for the classes
> that get the most confused.
kindly elaborate this a little, assume i am a novice.
and also  how to calculate error....

this is my 2nd thread on the same question as i was not receiving reply for a long time on my thread  
http://www.mathworks.com/matlabcentral/newsreader/view_thread/235521#614583
so its better to discuss the problem in a single place so i will ask questions on that ( old ) thread here..
in that thread you suggested that :

>Maybe your classes are not well defined
>and have to be partitioned into subclasses
>via clustering (e.g., k-means).
how to perform this process ( k means )?

>Overlay the plot of each misclassified character
>(blue) on the plot of the mean of the class to
>which they were assigned (red) and the plot of
>the mean of the correct class (black)clustering.
>This should give some insight into the difficulty.
kindly help on this too.....

i have changed the code and included validation and testing ...it gets converged when i use H=2000 but the validation and testing line ( on graph ) remain well above the goal line and results are about 15-20 %.
the code now goes as below:

Alphabet =Alpha4Train;
Target=TargetSet;
[S1,Qa] = size(Alphabet);  %% s=315 and Qa=54000 as now i  have resized characters to 21x15
[S2,Q] =size(Target);%% S2=11 and Q=5400  which means Q=Qa
% DEFINING THE NETWORK
% ====================
H1 =2000;  %% chosen using trial ...
net = newff(minmax(Alphabet),[H1 S2],{'tansig' 'logsig'},'trainscg');
net.performFcn = 'mse'; 
net.trainParam.goal = mean(var(Target))/100;%% as suggested by Greg Heath
net.trainParam.show = 10; 
net.trainParam.epochs = 500; 
% TRAINING THE NETWORK 
% ====================
testPercent = 0.25;  
validatePercent = 0.25; 
[trainSamples,validateSamples,testSamples] = dividevec(Alphabet,Target,testPercent,validatePercent);
[net,tr] = train(net,trainSamples.P,trainSamples.T,[],[],validateSamples,testSamples);

i know that number of questions are increasing to be answered at one time but ...hope no one will mind....
thanks in advance for your time...
Zaheer Ahmad...