Thread Subject: cross-validation

Subject: cross-validation

From: Lourdes Pelayo

Date: 10 Feb, 2008 20:32:01

Message: 1 of 8

I want to know how can I use the crossvalind file included
in the Bioinformatics toolbox.
I have my dataset that was classified with a different
algorithm, I just to validate my results with 10-fold cross
validation and/or leave-one-out.

I have 5 classes and my algorithm returns the degree of
membership for each instance. Therefore, the input data to
the 10-fold cross validation are N instances and 5 classes
(not features but degrees of membership since data have been
already classified).

The crossvalind.m file calls classify.m (stats toolbox). I
can modify the classify code to get the confusion matrix,
but when I enter the data, I get this error message: The
pooled covariance matrix of TRAINING must be positive definite.

What should I do?

thanks

Lourdes

Subject: cross-validation

From: Lucio Andrade-Cetto

Date: 11 Feb, 2008 15:19:03

Message: 2 of 8

Lourdes:

crossvalind does not call classify.m, (the example in the
help does), but you can use crossvalind to do a 10-fold
cross validation with any other classifier:

indices = crossvalind('Kfold',true_classes,10);
cp = classperf(true_classes);
for i = 1:10
      test = (indices == i); train = ~test;
      class_membership = yourclassifier(...
           data(train,:),true_classes(train,:));
      selected_class = max(class_membership,[],2);
      classperf(cp,selected_class,test)
end

Does this help you?, please feel free to e-mail me with
further questions.

Lucio Cetto


"Lourdes Pelayo" <loperam@yahoo.com> wrote in message
<fonn01$fma$1@fred.mathworks.com>...
> I want to know how can I use the crossvalind file included
> in the Bioinformatics toolbox.
> I have my dataset that was classified with a different
> algorithm, I just to validate my results with 10-fold
cross
> validation and/or leave-one-out.
>
> I have 5 classes and my algorithm returns the degree of
> membership for each instance. Therefore, the input data to
> the 10-fold cross validation are N instances and 5
classes
> (not features but degrees of membership since data have
been
> already classified).
>
> The crossvalind.m file calls classify.m (stats toolbox). I
> can modify the classify code to get the confusion matrix,
> but when I enter the data, I get this error message: The
> pooled covariance matrix of TRAINING must be positive
definite.
>
> What should I do?
>
> thanks
>
> Lourdes

Subject: cross-validation

From: Prime Mover

Date: 29 Feb, 2008 16:55:00

Message: 3 of 8

Let me take this oportunity to see if I can understand a bit about
this "confusion matrix"...

Suppose I have an N x M image that I want to classify with a
supervised method.

I then select 5 classes as references for classification, by
exctracting n x n pixels (an n x n matrix)
in 5 different areas of the image and calculating some stastistical
parameter for each one of
these matrices.

I run the classifier to classify every pixel in the image by comparing
the statistical property
of each neighborhood of size n x n around each pixel of the image,
with those from the 5 classes.
Using some distance criteria, I have all pixels classified as one of
the 5 classes.

Now, what is the procedure to obtain a confusion matrix? I've read
somewhere that I should
select other bunch of areas in the image, where I also know to each
one of the 5 classes
these areas belongs (test sites) and the confusion matrix would be
constructed by counting
how many pixels are classified as class 1... 5 in each test site. But,
if this is correct, how to
avoid bias while selecting such test sites?


Thanks!



On 11 fev, 12:19, "Lucio Andrade-Cetto" <lce...@nospam.mathworks.com>
wrote:
> Lourdes:
>
> crossvalind does not call classify.m, (the example in the
> help does), but you can use crossvalind to do a 10-fold
> cross validation with any other classifier:
>
> indices =3D crossvalind('Kfold',true_classes,10);
> cp =3D classperf(true_classes);
> for i =3D 1:10
> =A0 =A0 =A0 test =3D (indices =3D=3D i); train =3D ~test;
> =A0 =A0 =A0 class_membership =3D yourclassifier(...
> =A0 =A0 =A0 =A0 =A0 =A0data(train,:),true_classes(train,:));
> =A0 =A0 =A0 selected_class =3D max(class_membership,[],2);
> =A0 =A0 =A0 classperf(cp,selected_class,test)
> end
>
> Does this help you?, please feel free to e-mail me with
> further questions.
>
> Lucio Cetto
>
> "Lourdes Pelayo" <lope...@yahoo.com> wrote in message
>
> <fonn01$fm...@fred.mathworks.com>...
>
>
>
> > I want to know how can I use the crossvalind file included
> > in the Bioinformatics toolbox.
> > I have my dataset that was classified with a different
> > algorithm, I just to validate my results with 10-fold
> cross
> > validation and/or leave-one-out.
>
> > I have 5 classes and my algorithm returns the degree of
> > membership for each instance. Therefore, the input data to
> > the =A010-fold cross validation are N instances and 5
> classes
> > (not features but degrees of membership since data have
> been
> > already classified).
>
> > The crossvalind.m file calls classify.m (stats toolbox). I
> > can modify the classify code to get theconfusion matrix,
> > but when I enter the data, I get this error message: The
> > pooled covariance matrix of TRAINING must be positive
> definite.
>
> > What should I do?
>
> > thanks
>
> > Lourdes- Ocultar texto entre aspas -
>
> - Mostrar texto entre aspas -

Subject: cross-validation

From: Greg Heath

Date: 1 Mar, 2008 08:54:28

Message: 4 of 8

On Feb 29, 11:55 am, Prime Mover <eple...@hotmail.com> wrote:
> Let me take this oportunity to see if I can understand a bit about
> this "confusion matrix"...

A count confusion matrix is a summary of class assignments
with respect to the correct classification. True class counts
are summarized in the rows while assigned class counts are
summarized in the columns.

Example


CCM1 = [17 3 2; 1 18 2; 1 2 20]

From row 2: There were 21 assignments of class 2 objects.
18 were correct. One incorrect assignment was made to
class 1 and 2 incorrect assignments were made to class 3.

From column 2: There were 23 assignments made to class 2.
18 were correct. 3 of the 5 incorrect assignments were made
of class 1 objects and 2 of the 5 incorrect assignments
were made of class 3 objects.

For clarity in reports and presentations I always include row
and column sums:

CCM2 = [ 17 3 2 22; 1 18 2 21; 1 2 20 23; 19 23 24 66]

Sometimes managers and clients just want to see classification
statistics in a Per Cent Confusion Matrix:

PCM = 100*[17/22 3/22 2/22; 1/21 18/21 2/21; 1/23 2/23 20/23]

Hope this helps.

Greg

Subject: cross-validation

From: Prime Mover

Date: 3 Mar, 2008 17:22:47

Message: 5 of 8


Thanks Greg. That is clear like clean water.


On 1 mar, 05:54, Greg Heath <he...@alumni.brown.edu> wrote:
> On Feb 29, 11:55 am, Prime Mover <eple...@hotmail.com> wrote:
>
> > Let me take this oportunity to see if I can understand a bit about
> > this "confusion matrix"...
>
> A count confusion matrix is a summary of class assignments
> with respect to the correct classification. True class counts
> are summarized in the rows while assigned class counts are
> summarized in the columns.
>
> Example
>
> CCM1 = [17 3 2; 1 18 2; 1 2 20]
>
> From row 2: There were 21 assignments of class 2 objects.
> 18 were correct. One incorrect assignment was made to
> class 1 and 2 incorrect assignments were made to class 3.
>
> From column 2: There were 23 assignments made to class 2.
> 18 were correct. 3 of the 5 incorrect assignments were made
> of class 1 objects and 2 of the 5 incorrect assignments
> were made of class 3 objects.
>
> For clarity in reports and presentations I always include row
> and column sums:
>
> CCM2 = [ 17 3 2 22; 1 18 2 21; 1 2 20 23; 19 23 24 66]
>
> Sometimes managers and clients just want to see classification
> statistics in a Per Cent Confusion Matrix:
>
> PCM = 100*[17/22 3/22 2/22; 1/21 18/21 2/21; 1/23 2/23 20/23]
>
> Hope this helps.
>
> Greg

Subject: cross-validation

From: kash

Date: 22 Jan, 2012 19:36:34

Message: 6 of 8

"Lucio Cetto" wrote in message <fopp17$81p$1@fred.mathworks.com>...
> Lourdes:
>
> crossvalind does not call classify.m, (the example in the
> help does), but you can use crossvalind to do a 10-fold
> cross validation with any other classifier:
>
> indices = crossvalind('Kfold',true_classes,10);
> cp = classperf(true_classes);
> for i = 1:10
> test = (indices == i); train = ~test;
> class_membership = yourclassifier(...
> data(train,:),true_classes(train,:));
> selected_class = max(class_membership,[],2);
> classperf(cp,selected_class,test)
> end
>
> Does this help you?, please feel free to e-mail me with
> further questions.
>
> Lucio Cetto
>
>
> "Lourdes Pelayo" <loperam@yahoo.com> wrote in message
> <fonn01$fma$1@fred.mathworks.com>...
> > I want to know how can I use the crossvalind file included
> > in the Bioinformatics toolbox.
> > I have my dataset that was classified with a different
> > algorithm, I just to validate my results with 10-fold
> cross
> > validation and/or leave-one-out.
> >
> > I have 5 classes and my algorithm returns the degree of
> > membership for each instance. Therefore, the input data to
> > the 10-fold cross validation are N instances and 5
> classes
> > (not features but degrees of membership since data have
> been
> > already classified).
> >
> > The crossvalind.m file calls classify.m (stats toolbox). I
> > can modify the classify code to get the confusion matrix,
> > but when I enter the data, I get this error message: The
> > pooled covariance matrix of TRAINING must be positive
> definite.
> >
> > What should I do?
> >
> > thanks
> >
> > Lourdes
>
hi to alll even i am performing 5 fold cross validation ,i am new to it ,please can anyone help,i am performing accurate cancer classification,have taken 100 genes
1 to 100,now i want to perform 5 FOLD CROS VALIDATION ,i nees ti calculate accuracy,error ,mistakes for each combination of genes,thats is,


1,2
1,3
1,4
'
'
'
1,100
2,3
;
;
;
2,100
;
;
;
;
100,99

Subject: cross-validation

From: kash

Date: 22 Jan, 2012 19:36:34

Message: 7 of 8

"Lucio Cetto" wrote in message <fopp17$81p$1@fred.mathworks.com>...
> Lourdes:
>
> crossvalind does not call classify.m, (the example in the
> help does), but you can use crossvalind to do a 10-fold
> cross validation with any other classifier:
>
> indices = crossvalind('Kfold',true_classes,10);
> cp = classperf(true_classes);
> for i = 1:10
> test = (indices == i); train = ~test;
> class_membership = yourclassifier(...
> data(train,:),true_classes(train,:));
> selected_class = max(class_membership,[],2);
> classperf(cp,selected_class,test)
> end
>
> Does this help you?, please feel free to e-mail me with
> further questions.
>
> Lucio Cetto
>
>
> "Lourdes Pelayo" <loperam@yahoo.com> wrote in message
> <fonn01$fma$1@fred.mathworks.com>...
> > I want to know how can I use the crossvalind file included
> > in the Bioinformatics toolbox.
> > I have my dataset that was classified with a different
> > algorithm, I just to validate my results with 10-fold
> cross
> > validation and/or leave-one-out.
> >
> > I have 5 classes and my algorithm returns the degree of
> > membership for each instance. Therefore, the input data to
> > the 10-fold cross validation are N instances and 5
> classes
> > (not features but degrees of membership since data have
> been
> > already classified).
> >
> > The crossvalind.m file calls classify.m (stats toolbox). I
> > can modify the classify code to get the confusion matrix,
> > but when I enter the data, I get this error message: The
> > pooled covariance matrix of TRAINING must be positive
> definite.
> >
> > What should I do?
> >
> > thanks
> >
> > Lourdes
>
hi to alll even i am performing 5 fold cross validation ,i am new to it ,please can anyone help,i am performing accurate cancer classification,have taken 100 genes
1 to 100,now i want to perform 5 FOLD CROS VALIDATION ,i nees ti calculate accuracy,error ,mistakes for each combination of genes,thats is,


1,2
1,3
1,4
'
'
'
1,100
2,3
;
;
;
2,100
;
;
;
;
100,99

Subject: cross-validation

From: Greg Heath

Date: 23 Jan, 2012 03:16:09

Message: 8 of 8

On Jan 22, 2:36 pm, "kash " <aakaash...@gmail.com> wrote:
> "Lucio Cetto" wrote in message <fopp17$81...@fred.mathworks.com>...
> > Lourdes:
>
> > crossvalind does not call classify.m, (the example in the
> > help does), but you can use crossvalind to do a 10-fold
> > cross validation with any other classifier:
>
> > indices = crossvalind('Kfold',true_classes,10);
> > cp = classperf(true_classes);
> > for i = 1:10
> >       test = (indices == i); train = ~test;
> >       class_membership = yourclassifier(...
> >            data(train,:),true_classes(train,:));
> >       selected_class = max(class_membership,[],2);
> >       classperf(cp,selected_class,test)
> > end
>
> > Does this help you?, please feel free to e-mail me with
> > further questions.
>
> > Lucio Cetto
>
> > "Lourdes Pelayo" <lope...@yahoo.com> wrote in message
> > <fonn01$fm...@fred.mathworks.com>...
> > > I want to know how can I use the crossvalind file included
> > > in the Bioinformatics toolbox.
> > > I have my dataset that was classified with a different
> > > algorithm, I just to validate my results with 10-fold
> > cross
> > > validation and/or leave-one-out.
>
> > > I have 5 classes and my algorithm returns the degree of
> > > membership for each instance. Therefore, the input data to
> > > the  10-fold cross validation are N instances and 5
> > classes
> > > (not features but degrees of membership since data have
> > been
> > > already classified).
>
> > > The crossvalind.m file calls classify.m (stats toolbox). I
> > > can modify the classify code to get the confusion matrix,
> > > but when I enter the data, I get this error message: The
> > > pooled covariance matrix of TRAINING must be positive
> > definite.
>
> > > What should I do?
>
> > > thanks
>
> > > Lourdes
>
> hi  to alll even i am performing 5 fold cross validation ,i am new to it ,please can anyone help,i am performing accurate cancer classification,have taken 100 genes
> 1 to 100,now i want to perform 5 FOLD CROS VALIDATION ,i nees ti calculate accuracy,error ,mistakes for each combination of genes,thats is,
>
> 1,2
> 1,3
> 1,4
> '
> '
> '
> 1,100
> 2,3
> ;
> ;
> ;
> 2,100
> ;
> ;
> ;
> ;
> 100,99- Hide quoted text -
>
> - Show quoted text -

The number of gene combinations is N0 = 9,900.
If the target output is cancer or no cancer, it can be represented in
1-D
with a unipolar binary target {1,0}.
The target matrix will have dimension size(t) = [ 1 N0 ]
If each gene is characterized by an I dimensional vector, the input
matrix will have the dimension size(p) = [ I N0 ].
If you are using a neural network, I recommend PATTERNNET
If you are not using a neural network, you will probably have to
transpose
these matrices.

Let N0 = N0c+N0n (N0c = No. in the cancer class, etc). If N0c << N0n,
then it
might be wise to duplicate or simulate some of N0c class vectors so
that
Nc = Nn = N0n and N0 is inceased to N = Nc+Nn = N0+N0n-N0c = 2*N0n.

Now that you have your input and target matrices, see the cross-
validation documentation and demos.

Hope this helps.

Greg

Tags for this Thread

Everyone's Tags:

Add a New Tag:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Tag Activity for This Thread
Tag Applied By Date/Time
classification Lourdes Pelayo 10 Feb, 2008 15:34:55
rssFeed for this Thread

Contact us at files@mathworks.com