<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538</link>
    <title>MATLAB Central Newsreader - crossvalind -- size of training/testing set?</title>
    <description>Feed for thread: crossvalind -- size of training/testing set?</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2012 by MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Sun, 01 Feb 2009 22:40:18 -0500</pubDate>
      <title>crossvalind -- size of training/testing set?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538#625325</link>
      <author>Sophia Yuditskaya</author>
      <description>Hi,&lt;br&gt;
&lt;br&gt;
I am calling crossvalind as follows:&lt;br&gt;
&lt;br&gt;
[train, test] = crossvalind('holdOut', groups);&lt;br&gt;
&lt;br&gt;
What proportion of the original data is put into training vs testing sets? I'm assuming it's 50% each ... but instead I'd like to use 25% of the data for training and 75% for testing. How do I specify this? I've tried&lt;br&gt;
&lt;br&gt;
[train, test] = crossvalind('holdOut', groups, 0.25);&lt;br&gt;
&lt;br&gt;
but I get an OutOfMemoryError.&lt;br&gt;
&lt;br&gt;
Any help would be appreciated.&lt;br&gt;
&lt;br&gt;
Thanks,&lt;br&gt;
&lt;br&gt;
Sophia</description>
    </item>
    <item>
      <pubDate>Mon, 02 Feb 2009 16:38:25 -0500</pubDate>
      <title>Re: crossvalind -- size of training/testing set?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538#625485</link>
      <author>Peter Perkins</author>
      <description>Sophia Yuditskaya wrote:&lt;br&gt;
&lt;br&gt;
&amp;gt; [train, test] = crossvalind('holdOut', groups, 0.25);&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; but I get an OutOfMemoryError.&lt;br&gt;
&lt;br&gt;
Sophia, that appears to be the correct syntax.  You haven't said what's in your variable &quot;groups&quot;.</description>
    </item>
    <item>
      <pubDate>Mon, 02 Feb 2009 18:44:03 -0500</pubDate>
      <title>Re: crossvalind -- size of training/testing set?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538#625509</link>
      <author>Greg Heath</author>
      <description>On Feb 1, 5:40 pm, &quot;Sophia Yuditskaya&quot; &amp;lt;scyud...@mit.edu&amp;gt; wrote:&lt;br&gt;
&amp;gt; Hi,&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I am calling crossvalind as follows:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; [train, test] = crossvalind('holdOut', groups);&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; What proportion of the original data is put into training vs testing sets? I'm assuming it's 50% each ... but instead I'd like to use 25% of the data for training and 75% for testing. How do I specify this? I've tried&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; [train, test] = crossvalind('holdOut', groups, 0.25);&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; but I get an OutOfMemoryError.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Any help would be appreciated.&lt;br&gt;
&lt;br&gt;
Carefully read&lt;br&gt;
&lt;br&gt;
doc crossvalind&lt;br&gt;
&lt;br&gt;
or&lt;br&gt;
&lt;br&gt;
help crossvalind&lt;br&gt;
&lt;br&gt;
because, according to&lt;br&gt;
&lt;br&gt;
&lt;a href=&quot;http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ref/crossvalind.html&quot;&gt;http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ref/crossvalind.html&lt;/a&gt;&lt;br&gt;
&lt;br&gt;
it looks like the proper syntax is&lt;br&gt;
&lt;br&gt;
P = 0.75&lt;br&gt;
[train, test] = crossvalind('HoldOut', groups, N, P);&lt;br&gt;
&lt;br&gt;
If you still get OutOfMemoryError, it looks like you might have to&lt;br&gt;
reduce N.&lt;br&gt;
&lt;br&gt;
Unfortunately you have not given us N or the number of groups.&lt;br&gt;
More unfortunately, crossvalind does not allow the specification&lt;br&gt;
of a validation set for determining training parameters.&lt;br&gt;
&lt;br&gt;
In general, the data set should have a 3-way train/validate/test&lt;br&gt;
split. See the comp.ai.neural.net FAQ and archives. Also see many&lt;br&gt;
of my posts in both CSSM and CANN regarding how to choose Ntrn,&lt;br&gt;
Nval and Ntst. Search Google Groups with&lt;br&gt;
&lt;br&gt;
greg-heath validation&lt;br&gt;
&lt;br&gt;
What you will find is that the ACTUAL subset SIZES are important;&lt;br&gt;
NOT their FRACTION of the total data set.&lt;br&gt;
&lt;br&gt;
Hope this helps.&lt;br&gt;
&lt;br&gt;
Greg</description>
    </item>
    <item>
      <pubDate>Mon, 02 Feb 2009 19:58:02 -0500</pubDate>
      <title>Re: crossvalind -- size of training/testing set?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538#625538</link>
      <author>Johan Carlson</author>
      <description>Greg Heath &amp;lt;heath@alumni.brown.edu&amp;gt; wrote in message &amp;lt;b70e2def-294a-4bce-adc0-cd2681cd6014@k36g2000pri.googlegroups.com&amp;gt;...&lt;br&gt;
&amp;gt; On Feb 1, 5:40 pm, &quot;Sophia Yuditskaya&quot; &amp;lt;scyud...@mit.edu&amp;gt; wrote:&lt;br&gt;
&amp;gt; &amp;gt; Hi,&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; I am calling crossvalind as follows:&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; [train, test] = crossvalind('holdOut', groups);&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; What proportion of the original data is put into training vs testing sets? I'm assuming it's 50% each ... but instead I'd like to use 25% of the data for training and 75% for testing. How do I specify this? I've tried&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; [train, test] = crossvalind('holdOut', groups, 0.25);&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; but I get an OutOfMemoryError.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Any help would be appreciated.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Carefully read&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; doc crossvalind&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; or&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; help crossvalind&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; because, according to&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;a href=&quot;http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ref/crossvalind.html&quot;&gt;http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ref/crossvalind.html&lt;/a&gt;&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; it looks like the proper syntax is&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; P = 0.75&lt;br&gt;
&amp;gt; [train, test] = crossvalind('HoldOut', groups, N, P);&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; If you still get OutOfMemoryError, it looks like you might have to&lt;br&gt;
&amp;gt; reduce N.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Unfortunately you have not given us N or the number of groups.&lt;br&gt;
&amp;gt; More unfortunately, crossvalind does not allow the specification&lt;br&gt;
&amp;gt; of a validation set for determining training parameters.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; In general, the data set should have a 3-way train/validate/test&lt;br&gt;
&amp;gt; split. See the comp.ai.neural.net FAQ and archives. Also see many&lt;br&gt;
&amp;gt; of my posts in both CSSM and CANN regarding how to choose Ntrn,&lt;br&gt;
&amp;gt; Nval and Ntst. Search Google Groups with&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; greg-heath validation&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; What you will find is that the ACTUAL subset SIZES are important;&lt;br&gt;
&amp;gt; NOT their FRACTION of the total data set.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Hope this helps.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Greg&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Greg is right! It's the size of the subset, but also to some extent how many of them you can form that matters. The reference below gives some interesting insight, although it's a bit &quot;stiff&quot;.&lt;br&gt;
&lt;br&gt;
/JC&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
AUTHOR =       {J. Shao},&lt;br&gt;
&amp;nbsp;&amp;nbsp;TITLE =        {Linear Model Order Selection by Cross-Validation},&lt;br&gt;
&amp;nbsp;&amp;nbsp;JOURNAL =      {J. Am. Stat. Assoc.},&lt;br&gt;
&amp;nbsp;&amp;nbsp;YEAR =         {1993},&lt;br&gt;
&amp;nbsp;&amp;nbsp;volume =       {88},&lt;br&gt;
&amp;nbsp;&amp;nbsp;number =       {422},&lt;br&gt;
&amp;nbsp;&amp;nbsp;pages =        {486--494},&lt;br&gt;
&amp;nbsp;&amp;nbsp;month =        {June},</description>
    </item>
    <item>
      <pubDate>Mon, 02 Feb 2009 20:43:01 -0500</pubDate>
      <title>Re: crossvalind -- size of training/testing set?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538#625547</link>
      <author>Sophia </author>
      <description>Thanks for your responses. I can give you the dataset size, but a bigger question that I have in this context is -- why does it work fine with the same dataset size using &quot;[train, test] = crossvalind('holdOut', groups);&quot;, while explicitly specifying the training set size seems to require a whole lot more memory?&lt;br&gt;
&lt;br&gt;
I will double check the documentation, but I couldn't seem to find any info regarding which data subdivision P corresponds to -- is P the proportion of data going to training, or to testing?&lt;br&gt;
&lt;br&gt;
The dataset size is 5000.&lt;br&gt;
&lt;br&gt;
Thanks,&lt;br&gt;
&lt;br&gt;
Sophia&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&quot;Johan Carlson&quot; &amp;lt;Johan.E.Carlson@gmail.com&amp;gt; wrote in message &amp;lt;gm7j8a$a2u$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; Greg Heath &amp;lt;heath@alumni.brown.edu&amp;gt; wrote in message &amp;lt;b70e2def-294a-4bce-adc0-cd2681cd6014@k36g2000pri.googlegroups.com&amp;gt;...&lt;br&gt;
&amp;gt; &amp;gt; On Feb 1, 5:40 pm, &quot;Sophia Yuditskaya&quot; &amp;lt;scyud...@mit.edu&amp;gt; wrote:&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; Hi,&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; I am calling crossvalind as follows:&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; [train, test] = crossvalind('holdOut', groups);&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; What proportion of the original data is put into training vs testing sets? I'm assuming it's 50% each ... but instead I'd like to use 25% of the data for training and 75% for testing. How do I specify this? I've tried&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; [train, test] = crossvalind('holdOut', groups, 0.25);&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; but I get an OutOfMemoryError.&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; Any help would be appreciated.&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; Carefully read&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; doc crossvalind&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; or&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; help crossvalind&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; because, according to&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; &lt;a href=&quot;http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ref/crossvalind.html&quot;&gt;http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ref/crossvalind.html&lt;/a&gt;&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; it looks like the proper syntax is&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; P = 0.75&lt;br&gt;
&amp;gt; &amp;gt; [train, test] = crossvalind('HoldOut', groups, N, P);&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; If you still get OutOfMemoryError, it looks like you might have to&lt;br&gt;
&amp;gt; &amp;gt; reduce N.&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; Unfortunately you have not given us N or the number of groups.&lt;br&gt;
&amp;gt; &amp;gt; More unfortunately, crossvalind does not allow the specification&lt;br&gt;
&amp;gt; &amp;gt; of a validation set for determining training parameters.&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; In general, the data set should have a 3-way train/validate/test&lt;br&gt;
&amp;gt; &amp;gt; split. See the comp.ai.neural.net FAQ and archives. Also see many&lt;br&gt;
&amp;gt; &amp;gt; of my posts in both CSSM and CANN regarding how to choose Ntrn,&lt;br&gt;
&amp;gt; &amp;gt; Nval and Ntst. Search Google Groups with&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; greg-heath validation&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; What you will find is that the ACTUAL subset SIZES are important;&lt;br&gt;
&amp;gt; &amp;gt; NOT their FRACTION of the total data set.&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; Hope this helps.&lt;br&gt;
&amp;gt; &amp;gt; &lt;br&gt;
&amp;gt; &amp;gt; Greg&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Greg is right! It's the size of the subset, but also to some extent how many of them you can form that matters. The reference below gives some interesting insight, although it's a bit &quot;stiff&quot;.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; /JC&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; AUTHOR =       {J. Shao},&lt;br&gt;
&amp;gt;   TITLE =        {Linear Model Order Selection by Cross-Validation},&lt;br&gt;
&amp;gt;   JOURNAL =      {J. Am. Stat. Assoc.},&lt;br&gt;
&amp;gt;   YEAR =         {1993},&lt;br&gt;
&amp;gt;   volume =       {88},&lt;br&gt;
&amp;gt;   number =       {422},&lt;br&gt;
&amp;gt;   pages =        {486--494},&lt;br&gt;
&amp;gt;   month =        {June},</description>
    </item>
    <item>
      <pubDate>Mon, 02 Feb 2009 22:02:20 -0500</pubDate>
      <title>Re: crossvalind -- size of training/testing set?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538#625560</link>
      <author>Lucio Andrade-Cetto</author>
      <description>[train, test] = crossvalind('holdOut', groups, 0.25);&lt;br&gt;
puts 75% into the training and holds 25%, if you omit the third input P defaults to 0.5 and the 50% are held out.&lt;br&gt;
You should definitively not get an out of memory problem, please contact support so they can help you diagnosing your problem.&lt;br&gt;
You may also send me your variable &quot;groups&quot; if you want.&lt;br&gt;
Lucio Cetto, TMW.&lt;br&gt;
&lt;br&gt;
&quot;Sophia Yuditskaya&quot; &amp;lt;scyudits@mit.edu&amp;gt; wrote in message &amp;lt;gm58ci$hrf$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; Hi,&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; I am calling crossvalind as follows:&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; [train, test] = crossvalind('holdOut', groups);&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; What proportion of the original data is put into training vs testing sets? I'm assuming it's 50% each ... but instead I'd like to use 25% of the data for training and 75% for testing. How do I specify this? I've tried&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; [train, test] = crossvalind('holdOut', groups, 0.25);&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; but I get an OutOfMemoryError.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Any help would be appreciated.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Thanks,&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Sophia</description>
    </item>
    <item>
      <pubDate>Tue, 03 Feb 2009 18:11:45 -0500</pubDate>
      <title>Re: crossvalind -- size of training/testing set?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538#625798</link>
      <author>Greg Heath</author>
      <description>On Feb 2, 5:02=A0pm, &quot;Lucio Andrade-Cetto&quot; &amp;lt;lce...@nospam.mathworks.com&amp;gt;&lt;br&gt;
wrote:&lt;br&gt;
&amp;gt; [train, test] =3D crossvalind('holdOut', groups, 0.25);&lt;br&gt;
&amp;gt; puts 75% into the training and holds 25%,&lt;br&gt;
&lt;br&gt;
You are correct. However, for some reasom, the OP wanted to holdout&lt;br&gt;
7%.&lt;br&gt;
&lt;br&gt;
The documentation says 'HoldOut'. Is the quantity case invariant?&lt;br&gt;
&lt;br&gt;
Hope this helps.&lt;br&gt;
&lt;br&gt;
Greg&lt;br&gt;
&amp;gt; if you omit the third input P defaults to 0.5 and the 50% are held out.&lt;br&gt;
&amp;gt; You should definitively not get an out of memory problem, please contact =&lt;br&gt;
support so they can help you diagnosing your problem.&lt;br&gt;
&amp;gt; You may also send me your variable &quot;groups&quot; if you want.&lt;br&gt;
&amp;gt; Lucio Cetto, TMW.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &quot;Sophia Yuditskaya&quot; &amp;lt;scyud...@mit.edu&amp;gt; wrote in message &amp;lt;gm58ci$hr...@fre=&lt;br&gt;
d.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; &amp;gt; Hi,&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; I am calling crossvalind as follows:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; [train, test] =3D crossvalind('holdOut', groups);&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; What proportion of the original data is put into training vs testing se=&lt;br&gt;
ts? I'm assuming it's 50% each ... but instead I'd like to use 25% of the d=&lt;br&gt;
ata for training and 75% for testing. How do I specify this? I've tried&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; [train, test] =3D crossvalind('holdOut', groups, 0.25);&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; but I get an OutOfMemoryError.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Any help would be appreciated.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Thanks,&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Sophia- Hide quoted text -&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; - Show quoted text -</description>
    </item>
    <item>
      <pubDate>Tue, 03 Feb 2009 18:25:00 -0500</pubDate>
      <title>Re: crossvalind -- size of training/testing set?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538#625805</link>
      <author>Greg Heath</author>
      <description>On Feb 2, 3:43 pm, &quot;Sophia &quot; &amp;lt;scyud...@mit.edu&amp;gt; wrote:&lt;br&gt;
&amp;gt; Thanks for your responses. I can give you the dataset size, but a bigger =&lt;br&gt;
question that I have in this context is -- why does it work fine with the s=&lt;br&gt;
ame dataset size using &quot;[train, test] =3D crossvalind('holdOut', groups);&quot;,=&lt;br&gt;
&amp;nbsp;while explicitly specifying the training set size seems to require a whole=&lt;br&gt;
&amp;nbsp;lot more memory?&lt;br&gt;
&lt;br&gt;
Do both 'holdOut' and 'HoldOut' work?&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&amp;gt; I will double check the documentation, but I couldn't seem to find any in=&lt;br&gt;
fo regarding which data subdivision P corresponds to -- is P the proportion=&lt;br&gt;
&amp;nbsp;of data going to training, or to testing?&lt;br&gt;
&lt;br&gt;
When Method =3D 'HoldOut', P =3D the proportion held out.&lt;br&gt;
&lt;br&gt;
&amp;gt; The dataset size is 5000.&lt;br&gt;
&lt;br&gt;
How many classes and how many input variables? As you&lt;br&gt;
can see from my previous posts&lt;br&gt;
&lt;br&gt;
greg-heath pretraining advice&lt;br&gt;
greg-heath Neq Nw&lt;br&gt;
&lt;br&gt;
unless you are using overtraining mitigation, the minimum&lt;br&gt;
size of Ntrn is determined by the number of inputs, hidden&lt;br&gt;
nodes and classes.&lt;br&gt;
&lt;br&gt;
Hope this helps.&lt;br&gt;
&lt;br&gt;
Greg</description>
    </item>
    <item>
      <pubDate>Tue, 03 Feb 2009 18:27:05 -0500</pubDate>
      <title>Re: crossvalind -- size of training/testing set?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538#625808</link>
      <author>Greg Heath</author>
      <description>On Feb 3, 1:11=A0pm, Greg Heath &amp;lt;he...@alumni.brown.edu&amp;gt; wrote:&lt;br&gt;
&amp;gt; On Feb 2, 5:02=A0pm, &quot;Lucio Andrade-Cetto&quot; &amp;lt;lce...@nospam.mathworks.com&amp;gt;&lt;br&gt;
&amp;gt; wrote:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; [train, test] =3D crossvalind('holdOut', groups, 0.25);&lt;br&gt;
&amp;gt; &amp;gt; puts 75% into the training and holds 25%,&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; You are correct. However, for some reasom, the OP wanted to holdout&lt;br&gt;
&amp;gt; 7%.&lt;br&gt;
&lt;br&gt;
Sorry, ...typo.... 75%&lt;br&gt;
&lt;br&gt;
Greg</description>
    </item>
    <item>
      <pubDate>Wed, 04 Feb 2009 05:40:55 -0500</pubDate>
      <title>Re: crossvalind -- size of training/testing set?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538#625917</link>
      <author>Ting Su</author>
      <description>Sophia,&lt;br&gt;
I have a few points to make.&lt;br&gt;
1. As greg pointed out, you should use [train, test] = &lt;br&gt;
crossvalind('holdOut', groups, 0.25) to get 75% for testing.&lt;br&gt;
&lt;br&gt;
2, However, you should not get out of memory error for a dataset with size &lt;br&gt;
5000. Please check whether you have provided the 'groups' variable &lt;br&gt;
correctly. The 2nd input of crossvalind can be either a positve integer &lt;br&gt;
specifing the number of observations , or a grouing varialbe(in this case, &lt;br&gt;
crossvalind performs stratified crossvalidation or holdout. A grouing &lt;br&gt;
variable sepecifies the class label for each observation. It can a numeric &lt;br&gt;
vector, a  logical vector, a cell vector of strings, or a character matrix &lt;br&gt;
with  each row representing a group label.&lt;br&gt;
&lt;br&gt;
3. You may want to try 'cvpartition' in the Statistics toolbox to do the &lt;br&gt;
holdout. It 's newer than crossvalind.&lt;br&gt;
&lt;br&gt;
Ting Su&lt;br&gt;
-The Mathworks&lt;br&gt;
&lt;br&gt;
&quot;Sophia &quot; &amp;lt;scyudits@mit.edu&amp;gt; wrote in message &lt;br&gt;
news:gm7lsl$di9$1@fred.mathworks.com...&lt;br&gt;
&amp;gt; Thanks for your responses. I can give you the dataset size, but a bigger &lt;br&gt;
&amp;gt; question that I have in this context is -- why does it work fine with the &lt;br&gt;
&amp;gt; same dataset size using &quot;[train, test] = crossvalind('holdOut', groups);&quot;, &lt;br&gt;
&amp;gt; while explicitly specifying the training set size seems to require a whole &lt;br&gt;
&amp;gt; lot more memory?&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I will double check the documentation, but I couldn't seem to find any &lt;br&gt;
&amp;gt; info regarding which data subdivision P corresponds to -- is P the &lt;br&gt;
&amp;gt; proportion of data going to training, or to testing?&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; The dataset size is 5000.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Thanks,&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Sophia&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &quot;Johan Carlson&quot; &amp;lt;Johan.E.Carlson@gmail.com&amp;gt; wrote in message &lt;br&gt;
&amp;gt; &amp;lt;gm7j8a$a2u$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt;&amp;gt; Greg Heath &amp;lt;heath@alumni.brown.edu&amp;gt; wrote in message &lt;br&gt;
&amp;gt;&amp;gt; &amp;lt;b70e2def-294a-4bce-adc0-cd2681cd6014@k36g2000pri.googlegroups.com&amp;gt;...&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; On Feb 1, 5:40 pm, &quot;Sophia Yuditskaya&quot; &amp;lt;scyud...@mit.edu&amp;gt; wrote:&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt; Hi,&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt; I am calling crossvalind as follows:&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt; [train, test] = crossvalind('holdOut', groups);&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt; What proportion of the original data is put into training vs testing &lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt; sets? I'm assuming it's 50% each ... but instead I'd like to use 25% &lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt; of the data for training and 75% for testing. How do I specify this? &lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt; I've tried&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt; [train, test] = crossvalind('holdOut', groups, 0.25);&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt; but I get an OutOfMemoryError.&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &amp;gt; Any help would be appreciated.&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; Carefully read&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; doc crossvalind&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; or&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; help crossvalind&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; because, according to&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; &lt;a href=&quot;http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ref/crossvalind.html&quot;&gt;http://www.mathworks.com/access/helpdesk/help/toolbox/bioinfo/ref/crossvalind.html&lt;/a&gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; it looks like the proper syntax is&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; P = 0.75&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; [train, test] = crossvalind('HoldOut', groups, N, P);&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; If you still get OutOfMemoryError, it looks like you might have to&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; reduce N.&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; Unfortunately you have not given us N or the number of groups.&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; More unfortunately, crossvalind does not allow the specification&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; of a validation set for determining training parameters.&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; In general, the data set should have a 3-way train/validate/test&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; split. See the comp.ai.neural.net FAQ and archives. Also see many&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; of my posts in both CSSM and CANN regarding how to choose Ntrn,&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; Nval and Ntst. Search Google Groups with&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; greg-heath validation&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; What you will find is that the ACTUAL subset SIZES are important;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; NOT their FRACTION of the total data set.&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; Hope this helps.&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; &amp;gt; Greg&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; Greg is right! It's the size of the subset, but also to some extent how &lt;br&gt;
&amp;gt;&amp;gt; many of them you can form that matters. The reference below gives some &lt;br&gt;
&amp;gt;&amp;gt; interesting insight, although it's a bit &quot;stiff&quot;.&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; /JC&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt;&lt;br&gt;
&amp;gt;&amp;gt; AUTHOR =       {J. Shao},&lt;br&gt;
&amp;gt;&amp;gt;   TITLE =        {Linear Model Order Selection by Cross-Validation},&lt;br&gt;
&amp;gt;&amp;gt;   JOURNAL =      {J. Am. Stat. Assoc.},&lt;br&gt;
&amp;gt;&amp;gt;   YEAR =         {1993},&lt;br&gt;
&amp;gt;&amp;gt;   volume =       {88},&lt;br&gt;
&amp;gt;&amp;gt;   number =       {422},&lt;br&gt;
&amp;gt;&amp;gt;   pages =        {486--494},&lt;br&gt;
&amp;gt;&amp;gt;   month =        {June}, </description>
    </item>
    <item>
      <pubDate>Fri, 06 Feb 2009 18:55:44 -0500</pubDate>
      <title>Re: crossvalind -- size of training/testing set?</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/243538#626692</link>
      <author>Greg Heath</author>
      <description>On Feb 4, 12:40=A0am, &quot;Ting Su&quot; &amp;lt;Ting...@mathworks.com&amp;gt; wrote:&lt;br&gt;
&amp;gt; Sophia,&lt;br&gt;
&amp;gt; I have a few points to make.&lt;br&gt;
&amp;gt; 1. As greg pointed out, you should use [train, test] =3D&lt;br&gt;
&amp;gt; crossvalind('holdOut', groups, 0.25) to get 75% for testing.&lt;br&gt;
&lt;br&gt;
NO!&lt;br&gt;
&lt;br&gt;
The third input is the percent held out for testing.&lt;br&gt;
&lt;br&gt;
Hope this helps.&lt;br&gt;
&lt;br&gt;
Greg</description>
    </item>
  </channel>
</rss>

