X-Received: by 10.224.17.140 with SMTP id s12mr6293767qaa.3.1363439736778;
        Sat, 16 Mar 2013 06:15:36 -0700 (PDT)
MIME-Version: 1.0
X-Received: by 10.49.97.166 with SMTP id eb6mr891263qeb.0.1363439736688; Sat,
 16 Mar 2013 06:15:36 -0700 (PDT)
Path: news.mathworks.com!newsfeed-00.mathworks.com!newsfeed2.dallas1.level3.net!news.level3.com!t2no3935026qal.0!news-out.google.com!k8ni188qas.0!nntp.google.com!dd2no1884347qab.0!postnews.google.com!w3g2000vba.googlegroups.com!not-for-mail
Newsgroups: comp.soft-sys.matlab
Date: Sat, 16 Mar 2013 06:15:36 -0700 (PDT)
Complaints-To: groups-abuse@google.com
Injection-Info: w3g2000vba.googlegroups.com; posting-host=70.215.77.166; posting-account=eN66xwoAAACcrVy_A6ukr6atsHzaxk64
NNTP-Posting-Host: 70.215.77.166
References: <kht5tf$m6v$1@newscl01ah.mathworks.com>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64;
 Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR
 3.0.30729; Media Center PC 6.0; BRI/2; NP06; .NET4.0C; AskTbORJ/5.15.15.36191),gzip(gfe)
Message-ID: <82f5bcc1-2c9e-4b3f-9f8f-7c82d43215ee@w3g2000vba.googlegroups.com>
Subject: Re: How to do a classification using Matlab?
From: Greg Heath <g.heath@verizon.net>
Injection-Date: Sat, 16 Mar 2013 13:15:36 +0000
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Lines: 105
Xref: news.mathworks.com comp.soft-sys.matlab:791306

On Mar 14, 2:46pm, "Aaronne " <ggyy...@hotmail.com> wrote:
> Hi Smart Guys,
>
> I have got the data (can be downloaded here: [enter link description here][1]) and tried to run a simple LDA based classification based on the 11 features stored in the dataset, ie, F1, F2, ..., F11.
>
> Here I wrote some codes in Matlab using only 2 features. May I ask some questions based on the codes I have got please?
>
>   clc; clf; clear all; close all;
>
>   %% Load the extracted features
>   features              = xlsread('ExtractedFeatures.xls');
>   numFeatures             = 23;
>
>   %% Define ground truth
>   groundTruthGroup          = cell(numFeatures,1);
>   groundTruthGroup(1:15)       = cellstr('Good');
>   groundTruthGroup(16:end)      = cellstr('bad');
>
>   %% Select features
>   featureSelcted           = [features(:,3), features(:,9)];
>
>   %% Run LDA
>   [ldaClass, ldaResubErr]       = classify(featureSelcted(:,1:2), featureSelcted(:,1:2), groundTruthGroup, 'linear');
>   bad                 = ~strcmp(ldaClass,groundTruthGroup);
>   ldaResubErr2            = sum(bad)/numFeatures;
>
>   [ldaResubCM,grpOrder]        = confusionmat(groundTruthGroup,ldaClass);
>
>   %% Scatter plot
>   gscatter(featureSelcted(:,1), featureSelcted(:,2), groundTruthGroup, 'rgb', 'osd');
>   xlabel('Feature 3');
>   ylabel('Feature 9');
>   hold on;
>   plot(featureSelcted(bad,1), featureSelcted(bad,2), 'kx');
>   hold off;
>
>   %% Leave one out cross validation
>   leaveOneOutPartition        = cvpartition(numFeatures, 'leaveout');
>   ldaClassFun             = @(xtrain, ytrain, xtest)(classify(xtest, xtrain, ytrain, 'linear'));
>   ldaCVErr              = crossval('mcr', featureSelcted(:,1:2), ...
>     groundTruthGroup, 'predfun', ldaClassFun, 'partition', leaveOneOutPartition);
>
>   %% Display the results
>   clc;
>   disp('______________________________________ Results ______________________________________________________');
>   disp(' ');
>   disp(sprintf('Resubstitution Error of LDA (Training Error calculated by Matlab build-in): %d', ldaResubErr));
>   disp(sprintf('Resubstitution Error of LDA (Training Error calculated manually): %d', ldaResubErr2));
>   disp(' ');
>   disp('Confusion Matrix:');
>   disp(ldaResubCM)
>   disp(sprintf('Cross Validation Error of LDA (Leave One Out): %d', ldaCVErr));
>   disp(' ');
>   disp('______________________________________________________________________________________________________');
>
> I. My first question is how to do a feature selection? For example, using forward or backward feature selection, and t-test based methods?
>
> I have checked that the Matlab has got the `sequentialfs` method but not sure how to incorporate it into my codes.
>
> II. How do using the Matlab `classify` method to do a classification with more than 2 features? Should we perform the PCA at first? For example, currently we have 11 features, and we run PCA to produce 2 or 3 PCs and then run the classification? (I am expecting to write a loop to add each feature one by one to do a forward feature selection. Not just run PCA to do a dimension reduciton.)

I don't know why you think PCA should even be considered for
classification dimensionality reduction.

It chooses the directions in which the variables have the most spread,
not the dimensions that have the most relative distances between
clustered subclasses.

You are probably better off clustering the mixture, or each class
separately, then
using either LDA with truncated/regularized pinv(Sw)*Sb or PLSREGRESS.

Hope this helps.

Greg