Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
How to do a classification using Matlab?

Subject: How to do a classification using Matlab?

From: Aaronne

Date: 14 Mar, 2013 18:46:07

Message: 1 of 5

Hi Smart Guys,

I have got the data (can be downloaded here: [enter link description here][1]) and tried to run a simple LDA based classification based on the 11 features stored in the dataset, ie, F1, F2, ..., F11.

Here I wrote some codes in Matlab using only 2 features. May I ask some questions based on the codes I have got please?

    clc; clf; clear all; close all;
    
    %% Load the extracted features
    features = xlsread('ExtractedFeatures.xls');
    numFeatures = 23;
    
    %% Define ground truth
    groundTruthGroup = cell(numFeatures,1);
    groundTruthGroup(1:15) = cellstr('Good');
    groundTruthGroup(16:end) = cellstr('bad');
    
    %% Select features
    featureSelcted = [features(:,3), features(:,9)];
    
    %% Run LDA
    [ldaClass, ldaResubErr] = classify(featureSelcted(:,1:2), featureSelcted(:,1:2), groundTruthGroup, 'linear');
    bad = ~strcmp(ldaClass,groundTruthGroup);
    ldaResubErr2 = sum(bad)/numFeatures;
    
    [ldaResubCM,grpOrder] = confusionmat(groundTruthGroup,ldaClass);
    
    %% Scatter plot
    gscatter(featureSelcted(:,1), featureSelcted(:,2), groundTruthGroup, 'rgb', 'osd');
    xlabel('Feature 3');
    ylabel('Feature 9');
    hold on;
    plot(featureSelcted(bad,1), featureSelcted(bad,2), 'kx');
    hold off;
    
    %% Leave one out cross validation
    leaveOneOutPartition = cvpartition(numFeatures, 'leaveout');
    ldaClassFun = @(xtrain, ytrain, xtest)(classify(xtest, xtrain, ytrain, 'linear'));
    ldaCVErr = crossval('mcr', featureSelcted(:,1:2), ...
        groundTruthGroup, 'predfun', ldaClassFun, 'partition', leaveOneOutPartition);
    
    %% Display the results
    clc;
    disp('______________________________________ Results ______________________________________________________');
    disp(' ');
    disp(sprintf('Resubstitution Error of LDA (Training Error calculated by Matlab build-in): %d', ldaResubErr));
    disp(sprintf('Resubstitution Error of LDA (Training Error calculated manually): %d', ldaResubErr2));
    disp(' ');
    disp('Confusion Matrix:');
    disp(ldaResubCM)
    disp(sprintf('Cross Validation Error of LDA (Leave One Out): %d', ldaCVErr));
    disp(' ');
    disp('______________________________________________________________________________________________________');


I. My first question is how to do a feature selection? For example, using forward or backward feature selection, and t-test based methods?

I have checked that the Matlab has got the `sequentialfs` method but not sure how to incorporate it into my codes.

II. How do using the Matlab `classify` method to do a classification with more than 2 features? Should we perform the PCA at first? For example, currently we have 11 features, and we run PCA to produce 2 or 3 PCs and then run the classification? (I am expecting to write a loop to add each feature one by one to do a forward feature selection. Not just run PCA to do a dimension reduciton.)

III. I have also try to run a ROC analysis. I refer to the webpage [enter link description here][2] which has got an implementation of a simple LDA method and produce the linear scores of the LDA. Then we can use `perfcurve` to get the ROC curve.

IIIa. However, I am not sure how to use `classify` method with `perfcurve` to get the ROC.

IIIb. Also, how to do a ROC with the cross-validation?

IIIc. After we have got the `OPTROCPT`, which is the best cut-off point, how can we use this cut-off point to produce better classification?

    %% ROC Analysis
    featureSelcted = [features(:,3), features(:,9)];
    groundTruthNumericalLable = [zeros(15,1); ones(8,1)];
    
    % Calculate linear discriminant coefficients
    ldaCoefficients = LDA(featureSelcted, groundTruthNumericalLable);
    
    % Calulcate linear scores for the training data
    ldaLinearScores = [ones(numFeatures,1) featureSelcted] * ldaCoefficients';
    
    % Calculate class probabilities
    classProbabilities = exp(ldaLinearScores) ./ repmat(sum(exp(ldaLinearScores),2),[1 2]);
    
    % Fit probabilities for scores
    figure,
    [FPR, TPR, Thr, AUC, OPTROCPT] = perfcurve(groundTruthNumericalLable(:,1), classProbabilities(:,1), 0);
    plot(FPR, TPR, 'or-')
    xlabel('False positive rate (FPR, 1-Specificity)'); ylabel('True positive rate (TPR, Sensitivity)')
    title('ROC for classification by LDA')
    grid on;

IV. Currently, I calculate the accuracy of the training and cross validation errors by the classify and `crossval` functions. May I ask how to get those values in a summary by using `classperf`?

V. If anyone knows a good tutorial of using Matlab statistic toolbox to do machine learning task with a full example please tell me.

Some Matlab Help examples are really confusing to me because the examples are made in pieces and I am really a novice to machine learning. Sorry if I asked some question bot proper. Thanks very much for your help.



A.


  [1]: http://ge.tt/6eijw4b/v/0
  [2]: http://matlabdatamining.blogspot.co.uk/2010/12/linear-discriminant-analysis-lda.html

Subject: How to do a classification using Matlab?

From: Alan_Weiss

Date: 15 Mar, 2013 13:23:59

Message: 2 of 5

On 3/14/2013 2:46 PM, Aaronne wrote:
> Hi Smart Guys,
>
> I have got the data (can be downloaded here: [enter link description
> here][1]) and tried to run a simple LDA based classification based on
> the 11 features stored in the dataset, ie, F1, F2, ..., F11.
>
> Here I wrote some codes in Matlab using only 2 features. May I ask
> some questions based on the codes I have got please?
>
> clc; clf; clear all; close all;
> %% Load the extracted features
> features = xlsread('ExtractedFeatures.xls');
> numFeatures = 23;
> %% Define ground truth
> groundTruthGroup = cell(numFeatures,1);
> groundTruthGroup(1:15) = cellstr('Good');
> groundTruthGroup(16:end) = cellstr('bad');
> %% Select features
> featureSelcted = [features(:,3), features(:,9)];
> %% Run LDA
> [ldaClass, ldaResubErr] =
> classify(featureSelcted(:,1:2), featureSelcted(:,1:2),
> groundTruthGroup, 'linear');
> bad =
> ~strcmp(ldaClass,groundTruthGroup);
> ldaResubErr2 = sum(bad)/numFeatures;
> [ldaResubCM,grpOrder] =
> confusionmat(groundTruthGroup,ldaClass);
> %% Scatter plot
> gscatter(featureSelcted(:,1), featureSelcted(:,2),
> groundTruthGroup, 'rgb', 'osd');
> xlabel('Feature 3');
> ylabel('Feature 9');
> hold on;
> plot(featureSelcted(bad,1), featureSelcted(bad,2), 'kx');
> hold off;
> %% Leave one out cross validation
> leaveOneOutPartition = cvpartition(numFeatures,
> 'leaveout');
> ldaClassFun = @(xtrain, ytrain,
> xtest)(classify(xtest, xtrain, ytrain, 'linear'));
> ldaCVErr = crossval('mcr',
> featureSelcted(:,1:2), ...
> groundTruthGroup, 'predfun', ldaClassFun, 'partition',
> leaveOneOutPartition);
> %% Display the results
> clc;
> disp('______________________________________ Results
> ______________________________________________________');
> disp(' ');
> disp(sprintf('Resubstitution Error of LDA (Training Error
> calculated by Matlab build-in): %d', ldaResubErr));
> disp(sprintf('Resubstitution Error of LDA (Training Error
> calculated manually): %d', ldaResubErr2));
> disp(' ');
> disp('Confusion Matrix:');
> disp(ldaResubCM)
> disp(sprintf('Cross Validation Error of LDA (Leave One Out): %d',
> ldaCVErr));
> disp(' ');
> disp('______________________________________________________________________________________________________');
>
>
> I. My first question is how to do a feature selection? For example,
> using forward or backward feature selection, and t-test based methods?
>
> I have checked that the Matlab has got the `sequentialfs` method but
> not sure how to incorporate it into my codes.
> II. How do using the Matlab `classify` method to do a classification
> with more than 2 features? Should we perform the PCA at first? For
> example, currently we have 11 features, and we run PCA to produce 2 or
> 3 PCs and then run the classification? (I am expecting to write a loop
> to add each feature one by one to do a forward feature selection. Not
> just run PCA to do a dimension reduciton.)
>
> III. I have also try to run a ROC analysis. I refer to the webpage
> [enter link description here][2] which has got an implementation of a
> simple LDA method and produce the linear scores of the LDA. Then we
> can use `perfcurve` to get the ROC curve.
> IIIa. However, I am not sure how to use `classify` method with
> `perfcurve` to get the ROC.
>
> IIIb. Also, how to do a ROC with the cross-validation?
>
> IIIc. After we have got the `OPTROCPT`, which is the best cut-off
> point, how can we use this cut-off point to produce better
> classification?
>
> %% ROC Analysis
> featureSelcted = [features(:,3),
> features(:,9)]; groundTruthNumericalLable =
> [zeros(15,1); ones(8,1)];
> % Calculate linear discriminant coefficients
> ldaCoefficients = LDA(featureSelcted,
> groundTruthNumericalLable);
> % Calulcate linear scores for the training data
> ldaLinearScores = [ones(numFeatures,1)
> featureSelcted] * ldaCoefficients';
> % Calculate class probabilities
> classProbabilities = exp(ldaLinearScores) ./
> repmat(sum(exp(ldaLinearScores),2),[1 2]);
> % Fit probabilities for scores
> figure,
> [FPR, TPR, Thr, AUC, OPTROCPT] =
> perfcurve(groundTruthNumericalLable(:,1), classProbabilities(:,1), 0);
> plot(FPR, TPR, 'or-')
> xlabel('False positive rate (FPR, 1-Specificity)'); ylabel('True
> positive rate (TPR, Sensitivity)')
> title('ROC for classification by LDA')
> grid on;
>
> IV. Currently, I calculate the accuracy of the training and cross
> validation errors by the classify and `crossval` functions. May I ask
> how to get those values in a summary by using `classperf`?
>
> V. If anyone knows a good tutorial of using Matlab statistic toolbox
> to do machine learning task with a full example please tell me.
> Some Matlab Help examples are really confusing to me because the
> examples are made in pieces and I am really a novice to machine
> learning. Sorry if I asked some question bot proper. Thanks very much
> for your help.
>
>
>
> A.
>
>
> [1]: http://ge.tt/6eijw4b/v/0
> [2]:
> http://matlabdatamining.blogspot.co.uk/2010/12/linear-discriminant-analysis-lda.html

It sounds as if you have Statistics Toolbox. If so, then why bother
rewriting discriminant analysis code? There is a good deal of
information about discriminant analysis here:
http://www.mathworks.com/help/stats/discriminant-analysis-1.html
There may be more information than you care to read about classification
in these two sections:
http://www.mathworks.com/help/stats/supervised-learning.html
http://www.mathworks.com/help/stats/ensemble-learning.html

Good luck,

Alan Weiss
MATLAB mathematical toolbox documentation

Subject: How to do a classification using Matlab?

From: Greg Heath

Date: 16 Mar, 2013 13:15:36

Message: 3 of 5

On Mar 14, 2:46pm, "Aaronne " <ggyy...@hotmail.com> wrote:
> Hi Smart Guys,
>
> I have got the data (can be downloaded here: [enter link description here][1]) and tried to run a simple LDA based classification based on the 11 features stored in the dataset, ie, F1, F2, ..., F11.
>
> Here I wrote some codes in Matlab using only 2 features. May I ask some questions based on the codes I have got please?
>
> clc; clf; clear all; close all;
>
> %% Load the extracted features
> features = xlsread('ExtractedFeatures.xls');
> numFeatures = 23;
>
> %% Define ground truth
> groundTruthGroup = cell(numFeatures,1);
> groundTruthGroup(1:15) = cellstr('Good');
> groundTruthGroup(16:end) = cellstr('bad');
>
> %% Select features
> featureSelcted = [features(:,3), features(:,9)];
>
> %% Run LDA
> [ldaClass, ldaResubErr] = classify(featureSelcted(:,1:2), featureSelcted(:,1:2), groundTruthGroup, 'linear');
> bad = ~strcmp(ldaClass,groundTruthGroup);
> ldaResubErr2 = sum(bad)/numFeatures;
>
> [ldaResubCM,grpOrder] = confusionmat(groundTruthGroup,ldaClass);
>
> %% Scatter plot
> gscatter(featureSelcted(:,1), featureSelcted(:,2), groundTruthGroup, 'rgb', 'osd');
> xlabel('Feature 3');
> ylabel('Feature 9');
> hold on;
> plot(featureSelcted(bad,1), featureSelcted(bad,2), 'kx');
> hold off;
>
> %% Leave one out cross validation
> leaveOneOutPartition = cvpartition(numFeatures, 'leaveout');
> ldaClassFun = @(xtrain, ytrain, xtest)(classify(xtest, xtrain, ytrain, 'linear'));
> ldaCVErr = crossval('mcr', featureSelcted(:,1:2), ...
> groundTruthGroup, 'predfun', ldaClassFun, 'partition', leaveOneOutPartition);
>
> %% Display the results
> clc;
> disp('______________________________________ Results ______________________________________________________');
> disp(' ');
> disp(sprintf('Resubstitution Error of LDA (Training Error calculated by Matlab build-in): %d', ldaResubErr));
> disp(sprintf('Resubstitution Error of LDA (Training Error calculated manually): %d', ldaResubErr2));
> disp(' ');
> disp('Confusion Matrix:');
> disp(ldaResubCM)
> disp(sprintf('Cross Validation Error of LDA (Leave One Out): %d', ldaCVErr));
> disp(' ');
> disp('______________________________________________________________________________________________________');
>
> I. My first question is how to do a feature selection? For example, using forward or backward feature selection, and t-test based methods?
>
> I have checked that the Matlab has got the `sequentialfs` method but not sure how to incorporate it into my codes.
>
> II. How do using the Matlab `classify` method to do a classification with more than 2 features? Should we perform the PCA at first? For example, currently we have 11 features, and we run PCA to produce 2 or 3 PCs and then run the classification? (I am expecting to write a loop to add each feature one by one to do a forward feature selection. Not just run PCA to do a dimension reduciton.)

I don't know why you think PCA should even be considered for
classification dimensionality reduction.

It chooses the directions in which the variables have the most spread,
not the dimensions that have the most relative distances between
clustered subclasses.

You are probably better off clustering the mixture, or each class
separately, then
using either LDA with truncated/regularized pinv(Sw)*Sb or PLSREGRESS.

Hope this helps.

Greg

Subject: How to do a classification using Matlab?

From: Shiguo

Date: 3 May, 2013 14:03:08

Message: 4 of 5

Classify can handle any number of features. No need to reduce features. Not to say PCA as commented by Greg.

"Aaronne" wrote in message <kht5tf$m6v$1@newscl01ah.mathworks.com>...
> Hi Smart Guys,
>
> I have got the data (can be downloaded here: [enter link description here][1]) and tried to run a simple LDA based classification based on the 11 features stored in the dataset, ie, F1, F2, ..., F11.
>
> Here I wrote some codes in Matlab using only 2 features. May I ask some questions based on the codes I have got please?
>
> clc; clf; clear all; close all;
>
> %% Load the extracted features
> features = xlsread('ExtractedFeatures.xls');
> numFeatures = 23;
>
> %% Define ground truth
> groundTruthGroup = cell(numFeatures,1);
> groundTruthGroup(1:15) = cellstr('Good');
> groundTruthGroup(16:end) = cellstr('bad');
>
> %% Select features
> featureSelcted = [features(:,3), features(:,9)];
>
> %% Run LDA
> [ldaClass, ldaResubErr] = classify(featureSelcted(:,1:2), featureSelcted(:,1:2), groundTruthGroup, 'linear');
> bad = ~strcmp(ldaClass,groundTruthGroup);
> ldaResubErr2 = sum(bad)/numFeatures;
>
> [ldaResubCM,grpOrder] = confusionmat(groundTruthGroup,ldaClass);
>
> %% Scatter plot
> gscatter(featureSelcted(:,1), featureSelcted(:,2), groundTruthGroup, 'rgb', 'osd');
> xlabel('Feature 3');
> ylabel('Feature 9');
> hold on;
> plot(featureSelcted(bad,1), featureSelcted(bad,2), 'kx');
> hold off;
>
> %% Leave one out cross validation
> leaveOneOutPartition = cvpartition(numFeatures, 'leaveout');
> ldaClassFun = @(xtrain, ytrain, xtest)(classify(xtest, xtrain, ytrain, 'linear'));
> ldaCVErr = crossval('mcr', featureSelcted(:,1:2), ...
> groundTruthGroup, 'predfun', ldaClassFun, 'partition', leaveOneOutPartition);
>
> %% Display the results
> clc;
> disp('______________________________________ Results ______________________________________________________');
> disp(' ');
> disp(sprintf('Resubstitution Error of LDA (Training Error calculated by Matlab build-in): %d', ldaResubErr));
> disp(sprintf('Resubstitution Error of LDA (Training Error calculated manually): %d', ldaResubErr2));
> disp(' ');
> disp('Confusion Matrix:');
> disp(ldaResubCM)
> disp(sprintf('Cross Validation Error of LDA (Leave One Out): %d', ldaCVErr));
> disp(' ');
> disp('______________________________________________________________________________________________________');
>
>
> I. My first question is how to do a feature selection? For example, using forward or backward feature selection, and t-test based methods?
>
> I have checked that the Matlab has got the `sequentialfs` method but not sure how to incorporate it into my codes.
>
> II. How do using the Matlab `classify` method to do a classification with more than 2 features? Should we perform the PCA at first? For example, currently we have 11 features, and we run PCA to produce 2 or 3 PCs and then run the classification? (I am expecting to write a loop to add each feature one by one to do a forward feature selection. Not just run PCA to do a dimension reduciton.)
>
> III. I have also try to run a ROC analysis. I refer to the webpage [enter link description here][2] which has got an implementation of a simple LDA method and produce the linear scores of the LDA. Then we can use `perfcurve` to get the ROC curve.
>
> IIIa. However, I am not sure how to use `classify` method with `perfcurve` to get the ROC.
>
> IIIb. Also, how to do a ROC with the cross-validation?
>
> IIIc. After we have got the `OPTROCPT`, which is the best cut-off point, how can we use this cut-off point to produce better classification?
>
> %% ROC Analysis
> featureSelcted = [features(:,3), features(:,9)];
> groundTruthNumericalLable = [zeros(15,1); ones(8,1)];
>
> % Calculate linear discriminant coefficients
> ldaCoefficients = LDA(featureSelcted, groundTruthNumericalLable);
>
> % Calulcate linear scores for the training data
> ldaLinearScores = [ones(numFeatures,1) featureSelcted] * ldaCoefficients';
>
> % Calculate class probabilities
> classProbabilities = exp(ldaLinearScores) ./ repmat(sum(exp(ldaLinearScores),2),[1 2]);
>
> % Fit probabilities for scores
> figure,
> [FPR, TPR, Thr, AUC, OPTROCPT] = perfcurve(groundTruthNumericalLable(:,1), classProbabilities(:,1), 0);
> plot(FPR, TPR, 'or-')
> xlabel('False positive rate (FPR, 1-Specificity)'); ylabel('True positive rate (TPR, Sensitivity)')
> title('ROC for classification by LDA')
> grid on;
>
> IV. Currently, I calculate the accuracy of the training and cross validation errors by the classify and `crossval` functions. May I ask how to get those values in a summary by using `classperf`?
>
> V. If anyone knows a good tutorial of using Matlab statistic toolbox to do machine learning task with a full example please tell me.
>
> Some Matlab Help examples are really confusing to me because the examples are made in pieces and I am really a novice to machine learning. Sorry if I asked some question bot proper. Thanks very much for your help.
>
>
>
> A.
>
>
> [1]: http://ge.tt/6eijw4b/v/0
> [2]: http://matlabdatamining.blogspot.co.uk/2010/12/linear-discriminant-analysis-lda.html

Subject: How to do a classification using Matlab?

From: Greg Heath

Date: 1 May, 2014 06:57:55

Message: 5 of 5

"Shiguo" wrote in message <km0g2s$eue$1@newscl01ah.mathworks.com>...
> Classify can handle any number of features. No need to reduce features. Not to say PCA as commented by Greg.

It is well known that using ineffective features can drastically reduce the performance on nontraining data. Adding ineffective features increases the number of ineffective weights.
The subsequent OVERFITTING can lead to degraded performance if the model is OVERTRAINED.

Although the amount of degradation depends on the particular problem, it is wise to always to mitigate the problem by reducing variables, reducing weights, increasing training data and/or using regularization.

 See the comp.ai.neural-nets FAQ.

Hope this helps.

Greg

P.S. For classification use PLS, not PCA.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us