Path: news.mathworks.com!newsfeed-00.mathworks.com!nlpi057.nbdc.sbc.com!prodigy.net!news.glorb.com!postnews.google.com!k36g2000pri.googlegroups.com!not-for-mail
From: Greg Heath <heath@alumni.brown.edu>
Newsgroups: comp.soft-sys.matlab
Subject: Re: crossvalind -- size of training/testing set?
Date: Mon, 2 Feb 2009 10:44:03 -0800 (PST)
Organization: http://groups.google.com
Lines: 52
Message-ID: <b70e2def-294a-4bce-adc0-cd2681cd6014@k36g2000pri.googlegroups.com>
References: <gm58ci$hrf$1@fred.mathworks.com>
NNTP-Posting-Host: 68.39.98.10
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
X-Trace: posting.google.com 1233600243 32660 127.0.0.1 (2 Feb 2009 18:44:03 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Mon, 2 Feb 2009 18:44:03 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: k36g2000pri.googlegroups.com; posting-host=68.39.98.10; 
	posting-account=mUealwkAAACvQrLWvunjg50tRAnsNtJR
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; GTB5; 
	Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; .NET CLR 1.1.4322; 
	.NET CLR 2.0.50727; .NET CLR 3.0.04506.30; Seekmo 10.0.341.0),gzip(gfe),gzip(gfe)
Xref: news.mathworks.com comp.soft-sys.matlab:515473


On Feb 1, 5:40 pm, "Sophia Yuditskaya" <scyud...@mit.edu> wrote:
> Hi,
>
> I am calling crossvalind as follows:
>
> [train, test] = crossvalind('holdOut', groups);
>
> What proportion of the original data is put into training vs testing sets? I'm assuming it's 50% each ... but instead I'd like to use 25% of the data for training and 75% for testing. How do I specify this? I've tried
>
> [train, test] = crossvalind('holdOut', groups, 0.25);
>
> but I get an OutOfMemoryError.
>
> Any help would be appreciated.

Carefully read

doc crossvalind

or

help crossvalind

because, according to

http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ref/crossvalind.html

it looks like the proper syntax is

P = 0.75
[train, test] = crossvalind('HoldOut', groups, N, P);

If you still get OutOfMemoryError, it looks like you might have to
reduce N.

Unfortunately you have not given us N or the number of groups.
More unfortunately, crossvalind does not allow the specification
of a validation set for determining training parameters.

In general, the data set should have a 3-way train/validate/test
split. See the comp.ai.neural.net FAQ and archives. Also see many
of my posts in both CSSM and CANN regarding how to choose Ntrn,
Nval and Ntst. Search Google Groups with

greg-heath validation

What you will find is that the ACTUAL subset SIZES are important;
NOT their FRACTION of the total data set.

Hope this helps.

Greg