Path: news.mathworks.com!not-for-mail
From: "giannis " <fanzio12@yahoo.co.uk>
Newsgroups: comp.soft-sys.matlab
Subject: Re: small data set
Date: Wed, 7 May 2008 08:25:06 +0000 (UTC)
Organization: University of Sussex
Lines: 117
Message-ID: <fvrp12$mta$1@fred.mathworks.com>
References: <fvc63i$qfc$1@fred.mathworks.com> <afd7b073-23ba-496a-865f-b5655a22c64e@f36g2000hsa.googlegroups.com>  <85d534d6-2338-43dc-aa50-31add8d56fe3@j22g2000hsf.googlegroups.com>
Reply-To: "giannis " <fanzio12@yahoo.co.uk>
NNTP-Posting-Host: webapp-05-blr.mathworks.com
Content-Type: text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding: 8bit
X-Trace: fred.mathworks.com 1210148706 23466 172.30.248.35 (7 May 2008 08:25:06 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Wed, 7 May 2008 08:25:06 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 376695
Xref: news.mathworks.com comp.soft-sys.matlab:467115


Greg Heath <heath@alumni.brown.edu> wrote in message
<85d534d6-2338-43dc-aa50-31add8d56fe3@j22g2000hsf.googlegroups.com>...
> Corrected for the heinous sin of top-posting.
> 
> On May 5, 5:59=A0pm, "giannis " <fanzi...@yahoo.co.uk> wrote:
> >
> > Greg Heath <he...@alumni.brown.edu> wrote in message
> >
> >
<9b4c2a53-7f64-42a4-a546-5a8e0f9e2...@k13g2000hse.googlegroups.com>...
> > > On May 1, 7:22=3DA0am, Greg Heath <he...@alumni.brown.edu>
> > wrote:
> > > > On May 1, 6:30=3DA0am, "giannis " <fanzi...@yahoo.co.uk>
> > wrote:
> >
> > > > > Hello.
> >
> > > > > I am doing a statistical research using
KNN,neuralnets and
> > > > > SVM.. The problem is the very small data set (25
> > speciments).
> >
> > > > > I am using cross validation to resample the data
but I am
> > > > > not sure if my results can be accurate with such a
small
> > > > > data set.
> >
> > > > > can you please suggest any method to use as best as
> > possible
> > > > > =3DA0such a small data set?
> > > > > thank you in advance =3DA0
> >
> > > > Bootstrapping
> >
> > > > Search the mathworks website.
> >
> > > If you have prior information on the form of the
probability
> > > distribution function, you can use the 25 observations to
> > > estimate the parameters and then generate more "data".
> > > The danger is that, even in one dimension, 25 observations
> > > will not give you precise parameter estimates.
> >
> > > If you don't have such prior information you can test
> > > hypotheses as to which distribution the data might be
> > > from. However, with only 25 observations the testing will
> > > be far from definitive. You may test several
distributions,
> > > find that you can reject all except one. However, that
does
> > > not guarantee that it will be the correct distribution.
> >
> > > =3D2E..suddenly I have the feeling that the data is not
> > > 1-dimensional!
> >
> > > What are the dimensions of your input and output?
> > > Exactly what type of problem do you have and what
> > > exactly do you want the neural net to do?
> >
> > Hello Greg,
> >
> > thank you for all your help.
> >
> > I have data from 25 people. 20 of them have lung cancer and
> > 5 don't. I have 6 different characteristic for each person.
> > (so the array is 25X6)
> >
> > the tasks are:to produce two classifiers
> > 1st: to classify between a constant value - 2 outputs)
> > 2nd: to classify the stage of cancer 0,1,2,3 or 4 so - 5
> > outputs) =A0 =A0
> >
> > I tried to use SVM, Linear regresion, Backpropagation and
> > RBF Neural Nets and KNN.
> >
> > I tried to reshuffle my data using Leave One Out Cross
> > Validation (LOOCV) so keeping each time one for testing and
> > 24 for training.
> >
> > hope I gave you the picture..?
> 
> What kind of error rates are you getting for each method?
> What are the largest error rates that you would accept?
> 
> When you plot the desired {0,1} classification vs each
> of the inputs does there appear to be predictive capability?
> What are the corresponding correlation coefficients?
> 
> Hope this helps.
> 
> Greg



hello Greg,

the best results I can get till now are:

using 1st: 3 of the 5 characteristics
      2nd: 2-fold cross validation (using all the    
           combinations and at the end getting the average  
           error rate)
      3rd: KNN classification giving 75% correct 
           cp.Correct.rate and RBF neural network giving 68% 
           cp.Correct.rate

this error can be acceptable but because of the small data
set i have available i am not confident if these results can
be assumed reliable and if the method of reshuffling the
data is acceptable. 

-can you please explain me which plot you ask to do?

thank you

giannis