Path: news.mathworks.com!newsfeed-00.mathworks.com!newscon02.news.prodigy.net!prodigy.net!news.glorb.com!postnews.google.com!j22g2000hsf.googlegroups.com!not-for-mail
From: Greg Heath <heath@alumni.brown.edu>
Newsgroups: comp.soft-sys.matlab
Subject: Re: small data set
Date: Tue, 6 May 2008 00:43:15 -0700 (PDT)
Organization: http://groups.google.com
Lines: 83
Message-ID: <85d534d6-2338-43dc-aa50-31add8d56fe3@j22g2000hsf.googlegroups.com>
References: <fvc63i$qfc$1@fred.mathworks.com> <afd7b073-23ba-496a-865f-b5655a22c64e@f36g2000hsa.googlegroups.com> 
NNTP-Posting-Host: 69.141.173.117
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
X-Trace: posting.google.com 1210059796 7609 127.0.0.1 (6 May 2008 07:43:16 GMT)
X-Complaints-To: groups-abuse@google.com
NNTP-Posting-Date: Tue, 6 May 2008 07:43:16 +0000 (UTC)
Complaints-To: groups-abuse@google.com
Injection-Info: j22g2000hsf.googlegroups.com; posting-host=69.141.173.117; 
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; 
Xref: news.mathworks.com comp.soft-sys.matlab:466843


Corrected for the heinous sin of top-posting.

On May 5, 5:59=A0pm, "giannis " <fanzi...@yahoo.co.uk> wrote:
>
> Greg Heath <he...@alumni.brown.edu> wrote in message
>
> <9b4c2a53-7f64-42a4-a546-5a8e0f9e2...@k13g2000hse.googlegroups.com>...
> > On May 1, 7:22=3DA0am, Greg Heath <he...@alumni.brown.edu>
> wrote:
> > > On May 1, 6:30=3DA0am, "giannis " <fanzi...@yahoo.co.uk>
> wrote:
>
> > > > Hello.
>
> > > > I am doing a statistical research using KNN,neuralnets and
> > > > SVM.. The problem is the very small data set (25
> speciments).
>
> > > > I am using cross validation to resample the data but I am
> > > > not sure if my results can be accurate with such a small
> > > > data set.
>
> > > > can you please suggest any method to use as best as
> possible
> > > > =3DA0such a small data set?
> > > > thank you in advance =3DA0
>
> > > Bootstrapping
>
> > > Search the mathworks website.
>
> > If you have prior information on the form of the probability
> > distribution function, you can use the 25 observations to
> > estimate the parameters and then generate more "data".
> > The danger is that, even in one dimension, 25 observations
> > will not give you precise parameter estimates.
>
> > If you don't have such prior information you can test
> > hypotheses as to which distribution the data might be
> > from. However, with only 25 observations the testing will
> > be far from definitive. You may test several distributions,
> > find that you can reject all except one. However, that does
> > not guarantee that it will be the correct distribution.
>
> > =3D2E..suddenly I have the feeling that the data is not
> > 1-dimensional!
>
> > What are the dimensions of your input and output?
> > Exactly what type of problem do you have and what
> > exactly do you want the neural net to do?
>
> Hello Greg,
>
> thank you for all your help.
>
> I have data from 25 people. 20 of them have lung cancer and
> 5 don't. I have 6 different characteristic for each person.
> (so the array is 25X6)
>
> the tasks are:to produce two classifiers
> 1st: to classify between a constant value - 2 outputs)
> 2nd: to classify the stage of cancer 0,1,2,3 or 4 so - 5
> outputs) =A0 =A0
>
> I tried to use SVM, Linear regresion, Backpropagation and
> RBF Neural Nets and KNN.
>
> I tried to reshuffle my data using Leave One Out Cross
> Validation (LOOCV) so keeping each time one for testing and
> 24 for training.
>
> hope I gave you the picture..?

What kind of error rates are you getting for each method?
What are the largest error rates that you would accept?

When you plot the desired {0,1} classification vs each
of the inputs does there appear to be predictive capability?
What are the corresponding correlation coefficients?

Hope this helps.

Greg