<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0">
  <channel>
    <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572</link>
    <title>MATLAB Central Newsreader - small data set</title>
    <description>Feed for thread: small data set</description>
    <language>en-us</language>
    <copyright>&amp;copy;1994-2008 by The MathWorks, Inc.</copyright>
    <webmaster>webmaster@mathworks.com</webmaster>
    <generator>MATLAB Central Newsreader</generator>
    <docs>http://blogs.law.harvard.edu/tech/rss</docs>
    <ttl>60</ttl>
    <image>
      <title>The MathWorks</title>
      <url>http://www.mathworks.com/images/membrane_icon.gif</url>
    </image>
    <item>
      <pubDate>Thu, 01 May 2008 10:30:10 -0400</pubDate>
      <title>small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#429746</link>
      <author>giannis</author>
      <description>Hello.&lt;br&gt;
&lt;br&gt;
I am doing a statistical research using KNN, neural nets and&lt;br&gt;
SVM.. The problem is the very small data set (25 speciments).&lt;br&gt;
&lt;br&gt;
I am using cross validation to resample the data but I am&lt;br&gt;
not sure if my results can be accurate with such a small&lt;br&gt;
data set.&lt;br&gt;
&lt;br&gt;
can you please suggest any method to use as best as possible&lt;br&gt;
&amp;nbsp;such a small data set?&lt;br&gt;
thank you in advance   &lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Thu, 01 May 2008 11:22:44 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#429758</link>
      <author>Greg Heath</author>
      <description>On May 1, 6:30=A0am, "giannis " &amp;lt;fanzi...@yahoo.co.uk&amp;gt; wrote:&lt;br&gt;
&amp;gt; Hello.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I am doing a statistical research using KNN,neuralnets and&lt;br&gt;
&amp;gt; SVM.. The problem is the very small data set (25 speciments).&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I am using cross validation to resample the data but I am&lt;br&gt;
&amp;gt; not sure if my results can be accurate with such a small&lt;br&gt;
&amp;gt; data set.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; can you please suggest any method to use as best as possible&lt;br&gt;
&amp;gt; =A0such a small data set?&lt;br&gt;
&amp;gt; thank you in advance =A0&lt;br&gt;
&lt;br&gt;
Bootstrapping&lt;br&gt;
&lt;br&gt;
Search the mathworks website.&lt;br&gt;
&lt;br&gt;
Hope this helps.&lt;br&gt;
&lt;br&gt;
Greg&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Thu, 01 May 2008 11:50:05 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#429761</link>
      <author>John D'Errico</author>
      <description>"giannis " &amp;lt;fanzio12@yahoo.co.uk&amp;gt; wrote in message &lt;br&gt;
&amp;lt;fvc63i$qfc$1@fred.mathworks.com&amp;gt;...&lt;br&gt;
&amp;gt; Hello.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; I am doing a statistical research using KNN, neural nets and&lt;br&gt;
&amp;gt; SVM.. The problem is the very small data set (25 speciments).&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; I am using cross validation to resample the data but I am&lt;br&gt;
&amp;gt; not sure if my results can be accurate with such a small&lt;br&gt;
&amp;gt; data set.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; can you please suggest any method to use as best as possible&lt;br&gt;
&amp;gt;  such a small data set?&lt;br&gt;
&amp;gt; thank you in advance   &lt;br&gt;
&lt;br&gt;
The fact is, you only have 25 data points.&lt;br&gt;
&lt;br&gt;
No matter how hard you squeeze that rock,&lt;br&gt;
the only blood you will get from the rock&lt;br&gt;
will be that amount you leave behind from&lt;br&gt;
your own hands.&lt;br&gt;
&lt;br&gt;
Only pharmaceutical companies know the&lt;br&gt;
secret methodologies used to manufacture&lt;br&gt;
information where none actually exists.&lt;br&gt;
&lt;br&gt;
John&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Thu, 01 May 2008 17:49:42 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#429813</link>
      <author>roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson)</author>
      <description>In article &amp;lt;fvcapd$oc8$1@fred.mathworks.com&amp;gt;,&lt;br&gt;
John D'Errico &amp;lt;woodchips@rochester.rr.com&amp;gt; wrote:&lt;br&gt;
&lt;br&gt;
&amp;gt;The fact is, you only have 25 data points.&lt;br&gt;
&lt;br&gt;
&amp;gt;No matter how hard you squeeze that rock,&lt;br&gt;
&amp;gt;the only blood you will get from the rock&lt;br&gt;
&amp;gt;will be that amount you leave behind from&lt;br&gt;
&amp;gt;your own hands.&lt;br&gt;
&lt;br&gt;
&amp;gt;Only pharmaceutical companies know the&lt;br&gt;
&amp;gt;secret methodologies used to manufacture&lt;br&gt;
&amp;gt;information where none actually exists.&lt;br&gt;
&lt;br&gt;
Aye. We're getting amazingly good here at manufacturering new&lt;br&gt;
-data- from old, but manufacturing new -information- is still&lt;br&gt;
eluding us.&lt;br&gt;
&lt;br&gt;
Though much more often, the problem here is in manufacturing useful&lt;br&gt;
information from *too much* data.&lt;br&gt;
-- &lt;br&gt;
&amp;nbsp;&amp;nbsp;"If there were no falsehood in the world, there would be no&lt;br&gt;
&amp;nbsp;&amp;nbsp;doubt; if there were no doubt, there would be no inquiry; if no&lt;br&gt;
&amp;nbsp;&amp;nbsp;inquiry, no wisdom, no knowledge, no genius."&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;-- Walter Savage Landor&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Sat, 03 May 2008 18:25:04 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#430095</link>
      <author>giannis</author>
      <description>Hello,&lt;br&gt;
&lt;br&gt;
thank you for your reply,&lt;br&gt;
the fact is that I am doing a medical application so my data&lt;br&gt;
have medical nature.&lt;br&gt;
it would be very interesting if I could produce "new" data&lt;br&gt;
from the old ones and test the results.&lt;br&gt;
it would be a huge help if you could help me with this in&lt;br&gt;
any way.&lt;br&gt;
&lt;br&gt;
regards&lt;br&gt;
&lt;br&gt;
&amp;nbsp;&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson) wrote in&lt;br&gt;
message &amp;lt;fvcvrm$i5n$1@canopus.cc.umanitoba.ca&amp;gt;...&lt;br&gt;
&amp;gt; In article &amp;lt;fvcapd$oc8$1@fred.mathworks.com&amp;gt;,&lt;br&gt;
&amp;gt; John D'Errico &amp;lt;woodchips@rochester.rr.com&amp;gt; wrote:&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &amp;gt;The fact is, you only have 25 data points.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &amp;gt;No matter how hard you squeeze that rock,&lt;br&gt;
&amp;gt; &amp;gt;the only blood you will get from the rock&lt;br&gt;
&amp;gt; &amp;gt;will be that amount you leave behind from&lt;br&gt;
&amp;gt; &amp;gt;your own hands.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &amp;gt;Only pharmaceutical companies know the&lt;br&gt;
&amp;gt; &amp;gt;secret methodologies used to manufacture&lt;br&gt;
&amp;gt; &amp;gt;information where none actually exists.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Aye. We're getting amazingly good here at manufacturering new&lt;br&gt;
&amp;gt; -data- from old, but manufacturing new -information- is still&lt;br&gt;
&amp;gt; eluding us.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Though much more often, the problem here is in&lt;br&gt;
manufacturing useful&lt;br&gt;
&amp;gt; information from *too much* data.&lt;br&gt;
&amp;gt; -- &lt;br&gt;
&amp;gt;   "If there were no falsehood in the world, there would be no&lt;br&gt;
&amp;gt;   doubt; if there were no doubt, there would be no&lt;br&gt;
inquiry; if no&lt;br&gt;
&amp;gt;   inquiry, no wisdom, no knowledge, no genius."&lt;br&gt;
&amp;gt;                                               -- Walter&lt;br&gt;
Savage Landor&lt;br&gt;
&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Sun, 04 May 2008 20:30:22 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#430211</link>
      <author>carlos lopez</author>
      <description>I agree with Mr. Greg Heath; your last resort is&lt;br&gt;
bootstrapping. That might increase the confidence on the&lt;br&gt;
statistical result you have, i.e. to "fully" trust in the&lt;br&gt;
standard deviation estimate or alike... but if the highly&lt;br&gt;
beliavable estimate itself is not good for you there is no&lt;br&gt;
further solution!&lt;br&gt;
So all of the comments have value; if the (for example)&lt;br&gt;
standard deviation estimate is very precise but it is too&lt;br&gt;
large for your needs you will need extra data. No way to&lt;br&gt;
avoid that... or at least I am unaware of!&lt;br&gt;
Regards&lt;br&gt;
Carlos&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Mon, 05 May 2008 10:58:03 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#430289</link>
      <author>Greg Heath</author>
      <description>On May 1, 7:22=A0am, Greg Heath &amp;lt;he...@alumni.brown.edu&amp;gt; wrote:&lt;br&gt;
&amp;gt; On May 1, 6:30=A0am, "giannis " &amp;lt;fanzi...@yahoo.co.uk&amp;gt; wrote:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Hello.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; I am doing a statistical research using KNN,neuralnets and&lt;br&gt;
&amp;gt; &amp;gt; SVM.. The problem is the very small data set (25 speciments).&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; I am using cross validation to resample the data but I am&lt;br&gt;
&amp;gt; &amp;gt; not sure if my results can be accurate with such a small&lt;br&gt;
&amp;gt; &amp;gt; data set.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; can you please suggest any method to use as best as possible&lt;br&gt;
&amp;gt; &amp;gt; =A0such a small data set?&lt;br&gt;
&amp;gt; &amp;gt; thank you in advance =A0&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Bootstrapping&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Search the mathworks website.&lt;br&gt;
&lt;br&gt;
If you have prior information on the form of the probability&lt;br&gt;
distribution function, you can use the 25 observations to&lt;br&gt;
estimate the parameters and then generate more "data".&lt;br&gt;
The danger is that, even in one dimension, 25 observations&lt;br&gt;
will not give you precise parameter estimates.&lt;br&gt;
&lt;br&gt;
If you don't have such prior information you can test&lt;br&gt;
hypotheses as to which distribution the data might be&lt;br&gt;
from. However, with only 25 observations the testing will&lt;br&gt;
be far from definitive. You may test several distributions,&lt;br&gt;
find that you can reject all except one. However, that does&lt;br&gt;
not guarantee that it will be the correct distribution.&lt;br&gt;
&lt;br&gt;
=2E..suddenly I have the feeling that the data is not&lt;br&gt;
1-dimensional!&lt;br&gt;
&lt;br&gt;
What are the dimensions of your input and output?&lt;br&gt;
Exactly what type of problem do you have and what&lt;br&gt;
exactly do you want the neural net to do?&lt;br&gt;
&lt;br&gt;
Hope this helps.&lt;br&gt;
&lt;br&gt;
Greg&lt;br&gt;
&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Mon, 05 May 2008 21:59:04 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#430442</link>
      <author>giannis</author>
      <description>Hello Greg,&lt;br&gt;
&lt;br&gt;
thank you for all your help.&lt;br&gt;
&lt;br&gt;
I have data from 25 people. 20 of them have lung cancer and&lt;br&gt;
5 don't. I have 6 different characteristic for each person.&lt;br&gt;
(so the array is 25X6)&lt;br&gt;
&lt;br&gt;
the tasks are:to produce two classifiers&lt;br&gt;
1st: to classify between a constant value - 2 outputs)&lt;br&gt;
2nd: to classify the stage of cancer 0,1,2,3 or 4 so - 5&lt;br&gt;
outputs)    &lt;br&gt;
&lt;br&gt;
I tried to use SVM, Linear regresion, Backpropagation and&lt;br&gt;
RBF Neural Nets and KNN.&lt;br&gt;
&lt;br&gt;
I tried to reshuffle my data using Leave One Out Cross&lt;br&gt;
Validation (LOOCV) so keeping each time one for testing and&lt;br&gt;
24 for training.&lt;br&gt;
&lt;br&gt;
hope I gave you the picture..?&lt;br&gt;
&amp;nbsp;&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
Greg Heath &amp;lt;heath@alumni.brown.edu&amp;gt; wrote in message&lt;br&gt;
&amp;lt;9b4c2a53-7f64-42a4-a546-5a8e0f9e2cb9@k13g2000hse.googlegroups.com&amp;gt;...&lt;br&gt;
&amp;gt; On May 1, 7:22=A0am, Greg Heath &amp;lt;he...@alumni.brown.edu&amp;gt;&lt;br&gt;
wrote:&lt;br&gt;
&amp;gt; &amp;gt; On May 1, 6:30=A0am, "giannis " &amp;lt;fanzi...@yahoo.co.uk&amp;gt;&lt;br&gt;
wrote:&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; Hello.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; I am doing a statistical research using KNN,neuralnets and&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; SVM.. The problem is the very small data set (25&lt;br&gt;
speciments).&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; I am using cross validation to resample the data but I am&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; not sure if my results can be accurate with such a small&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; data set.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; can you please suggest any method to use as best as&lt;br&gt;
possible&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; =A0such a small data set?&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; thank you in advance =A0&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Bootstrapping&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Search the mathworks website.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; If you have prior information on the form of the probability&lt;br&gt;
&amp;gt; distribution function, you can use the 25 observations to&lt;br&gt;
&amp;gt; estimate the parameters and then generate more "data".&lt;br&gt;
&amp;gt; The danger is that, even in one dimension, 25 observations&lt;br&gt;
&amp;gt; will not give you precise parameter estimates.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; If you don't have such prior information you can test&lt;br&gt;
&amp;gt; hypotheses as to which distribution the data might be&lt;br&gt;
&amp;gt; from. However, with only 25 observations the testing will&lt;br&gt;
&amp;gt; be far from definitive. You may test several distributions,&lt;br&gt;
&amp;gt; find that you can reject all except one. However, that does&lt;br&gt;
&amp;gt; not guarantee that it will be the correct distribution.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; =2E..suddenly I have the feeling that the data is not&lt;br&gt;
&amp;gt; 1-dimensional!&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; What are the dimensions of your input and output?&lt;br&gt;
&amp;gt; Exactly what type of problem do you have and what&lt;br&gt;
&amp;gt; exactly do you want the neural net to do?&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Hope this helps.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Greg&lt;br&gt;
&amp;gt; &lt;br&gt;
&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Tue, 06 May 2008 05:22:12 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#430476</link>
      <author>roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson)</author>
      <description>In article &amp;lt;fvnvv8$jgg$1@fred.mathworks.com&amp;gt;,&lt;br&gt;
giannis  &amp;lt;fanzio12@yahoo.co.uk&amp;gt; wrote:&lt;br&gt;
&lt;br&gt;
&amp;gt;I have data from 25 people. 20 of them have lung cancer and&lt;br&gt;
&amp;gt;5 don't. I have 6 different characteristic for each person.&lt;br&gt;
&amp;gt;(so the array is 25X6)&lt;br&gt;
&lt;br&gt;
&amp;gt;the tasks are:to produce two classifiers&lt;br&gt;
&amp;gt;1st: to classify between a constant value - 2 outputs)&lt;br&gt;
&amp;gt;2nd: to classify the stage of cancer 0,1,2,3 or 4 so - 5&lt;br&gt;
&amp;gt;outputs)    &lt;br&gt;
&lt;br&gt;
Is this a scientific investigation, or a class exercise of some&lt;br&gt;
sort? If it is a a scientific investigation, then it is the sort&lt;br&gt;
of thing that my group does routinely and we may be able to help you.&lt;br&gt;
&lt;br&gt;
-- &lt;br&gt;
&amp;nbsp;&amp;nbsp;"The whole history of civilization is strewn with creeds and&lt;br&gt;
&amp;nbsp;&amp;nbsp;institutions which were invaluable at first, and deadly&lt;br&gt;
&amp;nbsp;&amp;nbsp;afterwards."                                -- Walter Bagehot&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Tue, 06 May 2008 07:43:15 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#430497</link>
      <author>Greg Heath</author>
      <description>Corrected for the heinous sin of top-posting.&lt;br&gt;
&lt;br&gt;
On May 5, 5:59=A0pm, "giannis " &amp;lt;fanzi...@yahoo.co.uk&amp;gt; wrote:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Greg Heath &amp;lt;he...@alumni.brown.edu&amp;gt; wrote in message&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;lt;9b4c2a53-7f64-42a4-a546-5a8e0f9e2...@k13g2000hse.googlegroups.com&amp;gt;...&lt;br&gt;
&amp;gt; &amp;gt; On May 1, 7:22=3DA0am, Greg Heath &amp;lt;he...@alumni.brown.edu&amp;gt;&lt;br&gt;
&amp;gt; wrote:&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; On May 1, 6:30=3DA0am, "giannis " &amp;lt;fanzi...@yahoo.co.uk&amp;gt;&lt;br&gt;
&amp;gt; wrote:&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; Hello.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; I am doing a statistical research using KNN,neuralnets and&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; SVM.. The problem is the very small data set (25&lt;br&gt;
&amp;gt; speciments).&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; I am using cross validation to resample the data but I am&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; not sure if my results can be accurate with such a small&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; data set.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; can you please suggest any method to use as best as&lt;br&gt;
&amp;gt; possible&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; =3DA0such a small data set?&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; thank you in advance =3DA0&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; Bootstrapping&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; Search the mathworks website.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; If you have prior information on the form of the probability&lt;br&gt;
&amp;gt; &amp;gt; distribution function, you can use the 25 observations to&lt;br&gt;
&amp;gt; &amp;gt; estimate the parameters and then generate more "data".&lt;br&gt;
&amp;gt; &amp;gt; The danger is that, even in one dimension, 25 observations&lt;br&gt;
&amp;gt; &amp;gt; will not give you precise parameter estimates.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; If you don't have such prior information you can test&lt;br&gt;
&amp;gt; &amp;gt; hypotheses as to which distribution the data might be&lt;br&gt;
&amp;gt; &amp;gt; from. However, with only 25 observations the testing will&lt;br&gt;
&amp;gt; &amp;gt; be far from definitive. You may test several distributions,&lt;br&gt;
&amp;gt; &amp;gt; find that you can reject all except one. However, that does&lt;br&gt;
&amp;gt; &amp;gt; not guarantee that it will be the correct distribution.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; =3D2E..suddenly I have the feeling that the data is not&lt;br&gt;
&amp;gt; &amp;gt; 1-dimensional!&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; What are the dimensions of your input and output?&lt;br&gt;
&amp;gt; &amp;gt; Exactly what type of problem do you have and what&lt;br&gt;
&amp;gt; &amp;gt; exactly do you want the neural net to do?&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; Hello Greg,&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; thank you for all your help.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I have data from 25 people. 20 of them have lung cancer and&lt;br&gt;
&amp;gt; 5 don't. I have 6 different characteristic for each person.&lt;br&gt;
&amp;gt; (so the array is 25X6)&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; the tasks are:to produce two classifiers&lt;br&gt;
&amp;gt; 1st: to classify between a constant value - 2 outputs)&lt;br&gt;
&amp;gt; 2nd: to classify the stage of cancer 0,1,2,3 or 4 so - 5&lt;br&gt;
&amp;gt; outputs) =A0 =A0&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I tried to use SVM, Linear regresion, Backpropagation and&lt;br&gt;
&amp;gt; RBF Neural Nets and KNN.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; I tried to reshuffle my data using Leave One Out Cross&lt;br&gt;
&amp;gt; Validation (LOOCV) so keeping each time one for testing and&lt;br&gt;
&amp;gt; 24 for training.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; hope I gave you the picture..?&lt;br&gt;
&lt;br&gt;
What kind of error rates are you getting for each method?&lt;br&gt;
What are the largest error rates that you would accept?&lt;br&gt;
&lt;br&gt;
When you plot the desired {0,1} classification vs each&lt;br&gt;
of the inputs does there appear to be predictive capability?&lt;br&gt;
What are the corresponding correlation coefficients?&lt;br&gt;
&lt;br&gt;
Hope this helps.&lt;br&gt;
&lt;br&gt;
Greg&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Wed, 07 May 2008 08:04:03 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#430764</link>
      <author>giannis</author>
      <description>Hello,&lt;br&gt;
&lt;br&gt;
thank you for your interest,&lt;br&gt;
this is part of my master project, and I will be grateful if&lt;br&gt;
you can help me in any way.&lt;br&gt;
thank you&lt;br&gt;
giannis&lt;br&gt;
&lt;br&gt;
roberson@ibd.nrc-cnrc.gc.ca (Walter Roberson) wrote in&lt;br&gt;
message &amp;lt;fvopu4$eg$1@canopus.cc.umanitoba.ca&amp;gt;...&lt;br&gt;
&amp;gt; In article &amp;lt;fvnvv8$jgg$1@fred.mathworks.com&amp;gt;,&lt;br&gt;
&amp;gt; giannis  &amp;lt;fanzio12@yahoo.co.uk&amp;gt; wrote:&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &amp;gt;I have data from 25 people. 20 of them have lung cancer and&lt;br&gt;
&amp;gt; &amp;gt;5 don't. I have 6 different characteristic for each person.&lt;br&gt;
&amp;gt; &amp;gt;(so the array is 25X6)&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; &amp;gt;the tasks are:to produce two classifiers&lt;br&gt;
&amp;gt; &amp;gt;1st: to classify between a constant value - 2 outputs)&lt;br&gt;
&amp;gt; &amp;gt;2nd: to classify the stage of cancer 0,1,2,3 or 4 so - 5&lt;br&gt;
&amp;gt; &amp;gt;outputs)    &lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Is this a scientific investigation, or a class exercise of&lt;br&gt;
some&lt;br&gt;
&amp;gt; sort? If it is a a scientific investigation, then it is&lt;br&gt;
the sort&lt;br&gt;
&amp;gt; of thing that my group does routinely and we may be able&lt;br&gt;
to help you.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; -- &lt;br&gt;
&amp;gt;   "The whole history of civilization is strewn with creeds and&lt;br&gt;
&amp;gt;   institutions which were invaluable at first, and deadly&lt;br&gt;
&amp;gt;   afterwards."                                -- Walter&lt;br&gt;
Bagehot&lt;br&gt;
&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Wed, 07 May 2008 08:25:06 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#430769</link>
      <author>giannis</author>
      <description>Greg Heath &amp;lt;heath@alumni.brown.edu&amp;gt; wrote in message&lt;br&gt;
&amp;lt;85d534d6-2338-43dc-aa50-31add8d56fe3@j22g2000hsf.googlegroups.com&amp;gt;...&lt;br&gt;
&amp;gt; Corrected for the heinous sin of top-posting.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; On May 5, 5:59=A0pm, "giannis " &amp;lt;fanzi...@yahoo.co.uk&amp;gt; wrote:&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Greg Heath &amp;lt;he...@alumni.brown.edu&amp;gt; wrote in message&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;lt;9b4c2a53-7f64-42a4-a546-5a8e0f9e2...@k13g2000hse.googlegroups.com&amp;gt;...&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; On May 1, 7:22=3DA0am, Greg Heath &amp;lt;he...@alumni.brown.edu&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; wrote:&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; On May 1, 6:30=3DA0am, "giannis " &amp;lt;fanzi...@yahoo.co.uk&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; wrote:&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; Hello.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; I am doing a statistical research using&lt;br&gt;
KNN,neuralnets and&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; SVM.. The problem is the very small data set (25&lt;br&gt;
&amp;gt; &amp;gt; speciments).&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; I am using cross validation to resample the data&lt;br&gt;
but I am&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; not sure if my results can be accurate with such a&lt;br&gt;
small&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; data set.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; can you please suggest any method to use as best as&lt;br&gt;
&amp;gt; &amp;gt; possible&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; =3DA0such a small data set?&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; &amp;gt; thank you in advance =3DA0&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; Bootstrapping&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; Search the mathworks website.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; If you have prior information on the form of the&lt;br&gt;
probability&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; distribution function, you can use the 25 observations to&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; estimate the parameters and then generate more "data".&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; The danger is that, even in one dimension, 25 observations&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; will not give you precise parameter estimates.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; If you don't have such prior information you can test&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; hypotheses as to which distribution the data might be&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; from. However, with only 25 observations the testing will&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; be far from definitive. You may test several&lt;br&gt;
distributions,&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; find that you can reject all except one. However, that&lt;br&gt;
does&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; not guarantee that it will be the correct distribution.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; =3D2E..suddenly I have the feeling that the data is not&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; 1-dimensional!&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; What are the dimensions of your input and output?&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; Exactly what type of problem do you have and what&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; exactly do you want the neural net to do?&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; Hello Greg,&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; thank you for all your help.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; I have data from 25 people. 20 of them have lung cancer and&lt;br&gt;
&amp;gt; &amp;gt; 5 don't. I have 6 different characteristic for each person.&lt;br&gt;
&amp;gt; &amp;gt; (so the array is 25X6)&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; the tasks are:to produce two classifiers&lt;br&gt;
&amp;gt; &amp;gt; 1st: to classify between a constant value - 2 outputs)&lt;br&gt;
&amp;gt; &amp;gt; 2nd: to classify the stage of cancer 0,1,2,3 or 4 so - 5&lt;br&gt;
&amp;gt; &amp;gt; outputs) =A0 =A0&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; I tried to use SVM, Linear regresion, Backpropagation and&lt;br&gt;
&amp;gt; &amp;gt; RBF Neural Nets and KNN.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; I tried to reshuffle my data using Leave One Out Cross&lt;br&gt;
&amp;gt; &amp;gt; Validation (LOOCV) so keeping each time one for testing and&lt;br&gt;
&amp;gt; &amp;gt; 24 for training.&lt;br&gt;
&amp;gt; &amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; hope I gave you the picture..?&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; What kind of error rates are you getting for each method?&lt;br&gt;
&amp;gt; What are the largest error rates that you would accept?&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; When you plot the desired {0,1} classification vs each&lt;br&gt;
&amp;gt; of the inputs does there appear to be predictive capability?&lt;br&gt;
&amp;gt; What are the corresponding correlation coefficients?&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Hope this helps.&lt;br&gt;
&amp;gt; &lt;br&gt;
&amp;gt; Greg&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
&lt;br&gt;
hello Greg,&lt;br&gt;
&lt;br&gt;
the best results I can get till now are:&lt;br&gt;
&lt;br&gt;
using 1st: 3 of the 5 characteristics&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;2nd: 2-fold cross validation (using all the    &lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;combinations and at the end getting the average  &lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;error rate)&lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;3rd: KNN classification giving 75% correct &lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cp.Correct.rate and RBF neural network giving 68% &lt;br&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;cp.Correct.rate&lt;br&gt;
&lt;br&gt;
this error can be acceptable but because of the small data&lt;br&gt;
set i have available i am not confident if these results can&lt;br&gt;
be assumed reliable and if the method of reshuffling the&lt;br&gt;
data is acceptable. &lt;br&gt;
&lt;br&gt;
-can you please explain me which plot you ask to do?&lt;br&gt;
&lt;br&gt;
thank you&lt;br&gt;
&lt;br&gt;
giannis&lt;br&gt;
</description>
    </item>
    <item>
      <pubDate>Wed, 07 May 2008 16:08:26 -0400</pubDate>
      <title>Re: small data set</title>
      <link>http://www.mathworks.com/matlabcentral/newsreader/view_thread/168572#430844</link>
      <author>Greg Heath</author>
      <description>-----SNIP&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; What are the dimensions of your input and output?&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; Exactly what type of problem do you have and what&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; &amp;gt; exactly do you want the neural net to do?&lt;br&gt;
&amp;gt;&lt;br&gt;
-----SNIP&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; I have data from 25 people. 20 of them have lung cancer and&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; 5 don't. I have 6 different characteristic for each person.&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; (so the array is 25X6)&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; the tasks are:to produce two classifiers&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; 1st: to classify between a constant value - 2 outputs)&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; 2nd: to classify the stage of cancer 0,1,2,3 or 4 so - 5&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; outputs)&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; I tried to use SVM, Linear regresion, Backpropagation and&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; RBF Neural Nets and KNN.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; I tried to reshuffle my data using Leave One Out Cross&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; Validation (LOOCV) so keeping each time one for testing and&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; 24 for training.&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; &amp;gt; hope I gave you the picture..?&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; What kind of error rates are you getting for each method?&lt;br&gt;
&amp;gt; &amp;gt; What are the largest error rates that you would accept?&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; &amp;gt; When you plot the desired {0,1} classification vs each&lt;br&gt;
&amp;gt; &amp;gt; of the inputs does there appear to be predictive capability?&lt;br&gt;
&amp;gt; &amp;gt; What are the corresponding correlation coefficients?&lt;br&gt;
&amp;gt;&lt;br&gt;
&amp;gt; the best results I can get till now are:&lt;br&gt;
&lt;br&gt;
I assume this is the first classifier:&lt;br&gt;
{0,1} =&amp;gt; {no cancer, cancer}&lt;br&gt;
&lt;br&gt;
&amp;gt; using 1st: 3 of the 5 characteristics&lt;br&gt;
&lt;br&gt;
I thought there were 6.&lt;br&gt;
Did using more inputs decrease the performance?&lt;br&gt;
&lt;br&gt;
&amp;gt;       2nd: 2-fold cross validation (using all the&lt;br&gt;
&amp;gt;            combinations )&lt;br&gt;
&lt;br&gt;
What does "using all the combinations" mean?&lt;br&gt;
&lt;br&gt;
2-fold XVAL ==&amp;gt; 13 training and 12 testing, then switch.&lt;br&gt;
&lt;br&gt;
I thought you originally said you were using LOO.&lt;br&gt;
&lt;br&gt;
&amp;gt; and at the end getting the average&lt;br&gt;
&amp;gt;            error rate)&lt;br&gt;
&amp;gt;       3rd: KNN classification giving 75% correct&lt;br&gt;
&amp;gt;            cp.Correct.rate&lt;br&gt;
&lt;br&gt;
How depressing.&lt;br&gt;
&lt;br&gt;
I just got 80% on your data without using the&lt;br&gt;
computer.&lt;br&gt;
&lt;br&gt;
what does the cp in cp.Correct.rate mean?&lt;br&gt;
&lt;br&gt;
What are the class conditional error rates, i.e.,&lt;br&gt;
What are the separate error rates for the 20 negatives&lt;br&gt;
and 5 positives?&lt;br&gt;
&lt;br&gt;
&amp;gt; and RBF neural network giving 68%&lt;br&gt;
&amp;gt;            cp.Correct.rate&lt;br&gt;
&lt;br&gt;
What about the MLP using NEWFF?&lt;br&gt;
&lt;br&gt;
How are you compensating for the 4:1 imbalance?&lt;br&gt;
&lt;br&gt;
&amp;gt; this error can be acceptable but because of the small data&lt;br&gt;
&amp;gt; set&lt;br&gt;
&lt;br&gt;
No it is not.&lt;br&gt;
&lt;br&gt;
I would consider less than 80% for the 5 positives as completely&lt;br&gt;
unacceptable!&lt;br&gt;
&lt;br&gt;
&amp;gt; i have available i am not confident if these results can&lt;br&gt;
&amp;gt; be assumed reliable&lt;br&gt;
&lt;br&gt;
Bootstrapping and many trials of 10-fold XVAL will yield enough&lt;br&gt;
replications so that you can estimate confidence levels.&lt;br&gt;
&lt;br&gt;
&amp;gt; and if the method of reshuffling the data is acceptable.&lt;br&gt;
&lt;br&gt;
&amp;nbsp;What reshuffling?  You said 2-fold XVAL ... how many trials?&lt;br&gt;
&lt;br&gt;
&amp;gt; -can you please explain me which plot you ask to do?&lt;br&gt;
&lt;br&gt;
You need to look at, at least, the 6*5/2 = 15 color coded pairwise&lt;br&gt;
2-D  projections of the 6-Ddata to see how the 5 differ from&lt;br&gt;
the 20. Also, looking at dominant PCA projections "may" help.&lt;br&gt;
&lt;br&gt;
For the RBF and MLP you need to either create 15 extra copies&lt;br&gt;
of the 5 or use some other method to balance 0/1 training.&lt;br&gt;
Search the archive of comp.ai.neural-nets and CSSM using&lt;br&gt;
&lt;br&gt;
greg-heath unbalanced&lt;br&gt;
&lt;br&gt;
sorting by date will help separate the earlier useful posts from&lt;br&gt;
the later referrals.&lt;br&gt;
&lt;br&gt;
If you get 0.5+eps on the 20 negatives and 0.5-eps on the 5&lt;br&gt;
positives you will get a 100% correct classification rate and a&lt;br&gt;
0.25 mean-square error. If you change the sign of eps you will&lt;br&gt;
get the same MSE with a 0% correct classification rate.&lt;br&gt;
&lt;br&gt;
Nevertheless, it is useful to record the overall and class-conditional&lt;br&gt;
MSEs.&lt;br&gt;
&lt;br&gt;
Record the class-conditional as well as the overall performance.&lt;br&gt;
The former are more important!&lt;br&gt;
&lt;br&gt;
Hope this helps.&lt;br&gt;
&lt;br&gt;
Greg&lt;br&gt;
</description>
    </item>
  </channel>
</rss>
