Skip to Main Content Skip to Search
Login
File Exchange
MATLAB Newsgroup
Link Exchange
  Blogs  
 Contest 
MathWorks.com

Thread Subject: Split data for neural nets

Subject: Split data for neural nets

From: Manos

Date: 21 Jul, 2008 20:23:02

Message: 1 of 2

Hi all,

I tried to apply neural nets and as I undestand I have to
follow three steps, create (with any new function), train,
and sim. I have a set of data up to 8000. Do I have to split
those data into two part (for create-train and another one
for sim)?
If yes what is the best split percentage?

Tahnks you in advance,
Manolis

Subject: Split data for neural nets

From: Greg Heath

Date: 22 Jul, 2008 00:42:35

Message: 2 of 2

On Jul 21, 4:23 pm, "Manos " <ezoul...@in.gr> wrote:
> Hi all,
>
> I tried to apply neural nets and as I undestand I have to
> follow three steps, create (with any new function), train,
> and sim. I have a set of data up to 8000. Do I have to split
> those data into two part (for create-train and another one
> for sim)?
> If yes what is the best split percentage?

It depends on
a. The complexity of your underlying noiseless I/O relation
b. The amount of noise in your measurements
c. The desired accuracy of your weight estimates
d. The desired precision of your error estimation.
e. The type of learning algorithm used.

In general, you would partition the data into three
subsets:

total = design + test (N = Ndes + Ntst)
design = training + validation (Ndes = Ntrn + Nval)

Ntrn must be large enough for accurate weight estimation.
Nval must be large enough to obtain good error estimates
so that the best of many candidate designs can be chosen.
Ntst must be large enough to obtain a precise unbiased
generalization error estimate (i.e., estimate of error on
all nondesign data).

If you use newff to create an I-H-O MLPNN, you will have
Neq = Ntrn*O equations to estimate Nw = (I+1)*H+(H+1)*O
unknown weights and thresholds. Typically, for good weight
estimation, r = Neq/Nw >> 1 is desirable. The actual value
depends on a, b and c (above) and is best determined by
trial and error. Although typical values are in the range
~2 <= r <= ~30, some problems may require r ~ 100. I usually
start my search with H = 0,1,2 and continue with powers of 2
unless H is large. Then I use a binomial search.

For reliable error estimation it is desirable to choose Nval
and Ntst so that stdv(eval) << eval and stdv(etst) << etst.
If you assume regression MSE is CHISQ distributed and
classifcation PCTERR has a BINOMIAL distribution, you can
find equations for stdv in a statistics handbook.

If you find that

Ntrn + Nval + Ntst > N.

Then you should consider bootstraping or crossvalidation.

See the FAQ in comp.ai.neural-nets. In addition, all of the
above info has been posted many, many times by me and others
in comp.ai.neural-nets and comp.soft-sys.matlab.

e.g, go to Google Groups and search on keywords like

greg-heath Ntrn Nval Ntst
greg-heath partition
greg-heath split

UH-OH, something is wrong with the search function in
Google Groups! Try Google Web.

Hope this helps.

Greg


Tags for this Thread

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

rssFeed for this Thread

envelope graphic E-mail this page to a colleague

Public Submission Policy
NOTICE: Any content you submit to MATLAB Central, including personal information, is not subject to the protections which may be afforded information collected under other sections of The MathWorks, Inc. Web site. You are entirely responsible for all content that you upload, post, e-mail, transmit or otherwise make available via MATLAB Central. The MathWorks does not control the content posted by visitors to MATLAB Central and, does not guarantee the accuracy, integrity, or quality of such content. Under no circumstances will The MathWorks be liable in any way for any content not authored by The MathWorks, or any loss or damage of any kind incurred as a result of the use of any content posted, e-mailed, transmitted or otherwise made available via MATLAB Central. Read the complete Disclaimer prior to use.
Related Topics