Path: news.mathworks.com!not-for-mail
From: "Greg Heath" <heath@alumni.brown.edu>
Newsgroups: comp.soft-sys.matlab
Subject: Matlab trainbr "converges" to trivial solution
Date: Wed, 10 Apr 2013 02:33:18 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 34
Message-ID: <kk2j1e$aic$1@newscl01ah.mathworks.com>
Reply-To: "Greg Heath" <heath@alumni.brown.edu>
NNTP-Posting-Host: www-00-blr.mathworks.com
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: newscl01ah.mathworks.com 1365561198 10828 172.30.248.45 (10 Apr 2013 02:33:18 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Wed, 10 Apr 2013 02:33:18 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 2929937
Xref: news.mathworks.com comp.soft-sys.matlab:793116

Subject: Matlab trainbr "converges" to trivial solution  
Sent: Apr 7, 2013 10:04:40 AM  

>See attached screen shot.  Re-initializing helps sometimes, but why does 
>this happen in the first place

It's just a combination of statistics and mountainous weight space. If you 
begin with random initial weights, and a parsimonious number of hidden 
nodes, H, there is no guarantee that a single run of steepest descent w/wo 
momentum will lead to a low local minimum, much less a global minimum. 
I routinely run 10 random weight initializations for each value of H that I try. 
Even when H is optimal, some of the solutions do not converge to a low local 
min.

When H is larger than necessary, validation stopping and/or regularization 
can be used to prevent overtraining the overfit net and the corresponding 
lack of ability to perform well on nontraining data. Nevertheless, since the 
initial weights are random, there is still no guarantee that steepest descent 
will lead to a low local min.

There are more exotic minimization algorithms than steepest descent.
However, they are much slower and are still not guaranteed to find a low
local min. It is more practical to 

either 
1. Design many nets with steepest descent and choose the one that 
minimizes the validation set error. 
or 
2. Design many nets with regularized descent and choose the one
that minimizes the training set errror.

It is very doubtful that one mimimization run will always be successful.

Greg