3.4

3.4 | 11 ratings Rate this file 196 Downloads (last 30 days) File Size: 3.15 MB File ID: #22970
image thumbnail

Feature Selection using Matlab

by Dimitrios Ververidis

 

13 Feb 2009 (Updated 29 Aug 2010)

Select the subset of features that maximizes Correct Classification Rate.

| Watch this File

File Information
Description

The DEMO includes 5 feature selection algorithms:
• Sequential Forward Selection (SFS)
• Sequential Floating Forward Selection (SFFS)
• Sequential Backward Selection (SBS)
• Sequential Floating Backward Selection (SFBS)
• ReliefF

Two CCR estimation methods:
• Cross-validation
• Resubstitution

After selecting the best feature subset, the classifier obtained can be used for classifying any pattern.

 Figure: Upper panel is the pattern x feature matrix
             Lower panel left are the features selected
             Lower panel right is the CCR curve during feature selection steps
             Right panel is the classification results of some patterns.

This software was developed using Matlab 7.5 and Windows XP.

Copyright: D. Ververidis and C.Kotropoulos
                 AIIA Lab, Thessaloniki, Greece,
                 jimver@aiia.csd.auth.gr
                 costas@aiia.csd.auth.gr

In order to run the DEMO:

In order to run the demo:
- A PC with Windows XP is needed.
- Use Matlab7.5 or later to run DEMO.m

1) Select the ‘finalvec.mat’ dataset (patterns x [features+1] matrix) from 'PatTargMatrices' folder. The last column of ‘finalvec.mat’ are the targets.
2) Press the run button on the panel. It is the second one.
3) After the selection of the optimum feature set, select a set of patterns for classification using the open folder button (last button). It can be the same data-set that was used for training the feature selection algorithm

% REFERENCES:
[1] D. Ververidis and C. Kotropoulos, "Fast and accurate feature subset selection applied into speech emotion recognition," Els. Signal Process., vol. 88, issue 12, pp. 2956-2970, 2008.
[2] D. Ververidis and C. Kotropoulos, "Information loss of the Mahalanobis distance in high dimensions: Application to feature selection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 31, no. 12, pp. 2275-2281, 2009.

MATLAB release MATLAB 7.5 (R2007b)
Other requirements Matlab 7.5
Tags for This File  
Everyone's Tags
Tags I've Applied
Add New Tags Please login to tag files.
Comments and Ratings (32)
18 Feb 2009 Dimitrios Ververidis

ok, it is the best. What should I say, it is mine. However, please sent me any bug when something goes wrong.

18 Feb 2009 John D'Errico

Spam. Only a demo anyway. Nothing of true value here except for an advertisement.

(Do not rate your own work. This defeats the purpose of a rating system.)

This author has an inflated opinion of his own work anyway. There is no help. Absolutely none. Even the low level functions have ZERO comments. Ok, not exactly zero. Here is one of the very few comments in the invdet function:

     detM = abs(detM); % Rem: Why ?

Yeah, right. A truly valuable comment.

Mlint points out multiple problems. At the very least, it shows deadwood in the code, often a sign of potentially buggy code.

26 Feb 2009 Dimitrios Ververidis

The 'InvDet' function was not used at all.

You should have seen that I prefered to use the matlab functions 'det' and 'inv' instead of mine 'InvDet' function.

There is a variety of methods to invert the determinant, however i consider Matlab's 'det' and 'inv' as the best ones.

The 'InvDet' is an implementation of Gauss-Jordan method, which may not be the best.

Search if a function is used before commenting it !!!

15 Dec 2009 Autumn

Hi, why did you combine the SFS and SFFS in a single program? Where do u separate those two methods?

17 Dec 2009 Dimitrios Ververidis

User can select between SFS and SFFS. See menu options. They are not combined.

They both belong to the general forward selection function. SFS is actually a part of SFFS.

With this move I avoided writing the same code twice.

21 Dec 2009 toto11

hello Ververidis,
my questions
 is your code taking in account the cross-correlation between coefficients?
and if I use the GA (genetic algorithm) for feature selection
how can I formulate the objectif function?
any ideas are the welcome.

29 Dec 2009 Dimitrios Ververidis

hi toto11,

1. Which coefficients ? Do you mean those in ReliefF function ? If you mean them, cross-correlation was not exploited.

2. Yes, you can use GA algorithm for feature selection:
       a) add your function GenetAlgo.m in DEMO.m (similarly as ReliefF.m), e.g.
%=============================================

  elseif strcmp(FSSettings.FSMethod,'ReliefF')
       [FeatureWeightsOrdered, FeaturesIndexOrdered, ...
            handles.OptimumFeatureSet] = ReliefF(handles.file,...
                    FSSettings,handles);
   elseif strcmp(FSSettings.FSMethod,'GenetAlgo')
             [FeatureWeightsOrdered, FeaturesIndexOrdered, ...
            handles.OptimumFeatureSet] = GenetAlgo(handles.file,
                    FSSettings,handles);
%=================================================

Do not forget to add 'GenetAlgo' as a string option in the menu of figure. The default is 'SFS'.

Structure FSSettings.YourSettings allows the use of any variables for GA (mutation vars etc.).

NOTE: If you have managed to add it, please send it to me, I will acknowledge you (is there any more formal name than toto11?)

01 Jan 2010 toto11

Hi Ververidis,
thanks for the reply
I mean by the cross-correlation, eliminate redundancy of coeffecients strongly correlated.
for the GA algorithm feature selection function not yet, If I have managed to add it, I will send it to you

03 Jan 2010 gasmi karim

Hello
I am doing a project in image processing. J'extraire 14 First feature of each image and then I want to do feature selection with genetic algorithm :: individual : size 14 and each case 0 if the feature does not participate in the classification and 1 otherwise
with the fitness function is calculated by occuracy svm. But until now I can not implement the GA matlab "demo" because I do not know what to change in the function bioagfit.m "please help me how I can change its methods and if there are other change in other methods and thank you

01 Mar 2010 lele hu

Hi
Thank you very much for your attention.
my first question :
  finalvecKidsVR.mat your supply me is a 2792*114 matrix, but after the first step, I got Patterns.mat, which is a 2792*90 matrix, please tell the reason. I think this is because there is a "end" missed in the file "DataLoadAndPreprocess.m" .
 
the second question:
  I remake the file finalvecKidsVR.mat with 50*114 size, which is the first 50 rows of your original finalvecKidsVR.mat. But after I started feature selection, the error message given is
"??? Undefined function or variable 'IndexFeature'.
Error in ==> ForwSel_main at 264
            HYelLines(IndexFeature)= plot((FeatureToInclude-0.5)...
Error in ==> DEMO>RunFeatSelection_ClickedCallback at 111
[ResultMat, ConfMatOpt, Tlapse, handles.OptimumFeatureSet,...
Error in ==> gui_mainfcn at 96
        feval(varargin{:});
Error in ==> DEMO at 19
    gui_mainfcn(gui_State, varargin{:});
 
??? Error while evaluating uipushtool ClickedCallback
"
 
the third question :
  What is the filename of the test dataset I should use to ensure the program work

the fourth question:
  Do you have this program for linux ?

01 Mar 2010 Dimitrios Ververidis

Hi,

1)There is no missing end in the "DataLoadAndPreprocess.m". As regards the reduction of 114 features to 90, it is happening because there were some features with NaNs in the certain dataset that were removed.

You can remove such features (if you have NaNs of Infs in your data) by adding some lines in function DataLoadAndPreprocess.m.

2) Well, there was a minor bug:

At the ForwSel_main.mat replace lines 256 to 273 with the following:

    if ~isempty(handles)
        axes(handles.YelLinesAxes);
        hold on
        if (NPatterns > KFeatures)
            axis([0 NPatterns 0 KFeatures]); axis manual
            HYelLines(FeatureToInclude)=plot([0 NPatterns+2],...
                (FeatureToInclude-0.5)*ones(1,2),'y','linewidth',3);
        else
            axis([0 KFeatures 0 NPatterns]); axis manual
            HYelLines(FeatureToInclude)= plot(...
                FeatureToInclude*ones(1,2),[0 KFeatures+2],'y','linewidth',3);
        end
        set(gca,'Visible','off');
        drawnow
        set(findobj(gcf,'Tag','ListSelFeats'), 'String', ...
            sort(SelectedFeatPool));
        axes(handles.FeatSelCurve);
    end

3) There is no need to change the name of your data. It works with any name. For example I debugged the previous error by generating a two class problem of 200 patterns (100 per class) and 2000 features estimated on them:

>> x = [0.25+0.1*randn(100,2000); 0.35+0.25*randn(100,2000)];
>> Data = [x [ones(100,1); 2*ones(100,1)]];

That correspond to patternsXfeatures matrix 200 X 2000
and Targets vector 200 X 1

I saved the "Data" variable as "Data.mat" in the [PatTargMatrices] folder and I loaded it with the GUI and I pressed run.

I don't have the linux version of Matlab and I don't know the differences. I don't think that there is any.

BR,
Dimitrios

12 May 2010 electronic engineering department Information school of Fudan university

Hi Ververidis,
If we must use it only by the GUI style?
Could you write a simple operation manual for us. I also have the proplem that lele hu said.

12 May 2010 kanaan

Hi all i need the code of SFS and SBS that return one feature at each step because i want to implement the IIFS algorithm see :

An improvement on floating search algorithms for feature subset selection songyot nakariyakul plz i need help and if there code for this paper.

23 Jun 2010 Omar A

Dear Dimitrios,
I have been trying to run your code but it gives me error. First i run it using the data set provided "finalvecKidsVR.mat", it start fine but by the end gives me error. Then I tried to use my own data set, the program did not let me to load/open my data.
When i change the file name of the data set you provided, the program did not run as well. I changed my data set file name to the same name of your data set "finalvecKidsVR.mat", and i faced the same problem.
can you help me with that?
thanks

18 Sep 2010 ly lu

I want to know the format of input data.
When I choose my own input file "test.mat", error occurs like below:
??? Error using ==> eval
Undefined function or variable 'test'.

Error in ==> DataLoadAndPreprocess at 26
[NPatterns, KInitialFeatures] = eval(['size(' DatasetToUse ')']);

Error in ==> DEMO>OpenDataFile_ClickedCallback at 64
    [Patterns, Targets] = DataLoadAndPreprocess(handles.file);

Error in ==> gui_mainfcn at 96
        feval(varargin{:});

Error in ==> DEMO at 19
    gui_mainfcn(gui_State, varargin{:});

Error in ==> DEMO>OpenDataMenu_Callback at 245
DEMO('OpenDataFile_ClickedCallback',gcbo,[],guidata(gcbo));

Error in ==> gui_mainfcn at 96
        feval(varargin{:});

Error in ==> DEMO at 19
    gui_mainfcn(gui_State, varargin{:});

??? Error while evaluating uimenu Callback

Can you tell me ,what's the problem?

20 Sep 2010 Dimitrios Ververidis

Response to Ly Lu:

The problem is that you used 'test' instead of 'test.mat' and therefore matlab can not find your file.

check if stringvariable DatasetToUse is 'test.mat'

09 Jan 2011 chen ??

When the FS steps larger than 200,I came up with the Error.Can you tell me.......
??? Error using ==> delete
Invalid handle object.

Error in ==> ForwSel_main at 458
    delete( HYelLines(LinesToDeleteFeatInd));

Error in ==> DEMO>RunFeatSelection_ClickedCallback at 111
[ResultMat, ConfMatOpt, Tlapse, handles.OptimumFeatureSet,...

Error in ==> gui_mainfcn at 75
        feval(varargin{:});

Error in ==> DEMO at 19
    gui_mainfcn(gui_State, varargin{:});

??? Error while evaluating uipushtool ClickedCallback.

Thank you.

10 Jan 2011 chen ??

I have solve the problem,thank you for your software, it is helpful for me.

15 Jan 2011 Tiger Deng

Dear all,

Is there anyone knowing the structure of input data, say finalVecKKidsVR, I know there are 2792 objects, and the feature vector is 90 dimension.

But which columns?
And also, what's the rest meaning for?

Thanks beforehand

16 Jan 2011 Tiger Deng

I just read the ForwSet_main.m file,
and find

%=============== Load The Patterns ===============================

% NPatterns: The number of Patterns
% KFeatures: The number of features
% CClasses : The number of features

% Patterns, features and Targets in a single matrix of
% NPatterns X (KFeatures + 1) dimensionality.
% The additional feature column is the Targets.
% Patterns: FLOAT numbers in [0,1]
% Targets: INTEGER in {1,2,...,C}, where C the number of classes.

So, in finalVecKKidsVR.m file, there are 2792 patterns, and 113 features?

Is that correct?

07 Apr 2011 Jaime Delgado Saa

I have no take a look on this but why are you posting a demo here... I dont think this is the purpose of this

24 Jun 2011 Guillermo

Other implementations of FS methods in Matlab can be found at http://www.prtools.org , at http://cmp.felk.cvut.cz/cmp/software/stprtool
For a comprehensive list of alternative FS related projects as well as other resources including benchmarking data see http://fst.utia.cz/?relres

14 Jul 2011 TabZim

Can anyone tell me how to implement a wrapper with Support vector machines.I've been trying to use the following code snippet for the purpose but it is always returning me one feature(which is the first one in case forward selection and last one in case of backward selection ).Can anyone explain to me why this is happening or give some other example as a demo to explain the feature selection process by using SVM. Many thanks in advance

%% FISHERIRIS DATA

load fisheriris
X = randn(150,20);
X(:,1:4)= meas(:,:);
y = species(1:100,:);
groups = ismember(species,'setosa');
y= groups(:,:)
X= scaleData(X); % to scale data in range [0,1]

%% CROSS VALIDATING

cc = cvpartition(y,'k',10);

%% SVM TRAINING AND TESTING FOR FEATURE %% SELECTION

opts = statset('display','iter');
OPTIONS=optimset('MaxIter',1000);
fun = @(Xtrain,Ytrain,Xtest,Ytest)...
 (sum(~strcmp(Ytest,svmclassify(svmtrain(Xtrain,Ytrain),Xtest))))

[fs,history] = sequentialfs(fun,X,y,'cv',cc,'options',opts,'nfeatures',3)

%% END OF CODE

07 Sep 2011 Sangtae Ahn

I have an error as below
What shall I do ?

-----------------------

??? Error using ==> eval
Undefined function or variable 'data'.

Error in ==> DataLoadAndPreprocess at 26
[NPatterns, KInitialFeatures] = eval(['size(' DatasetToUse ')']);

Error in ==> DEMO>OpenDataFile_ClickedCallback at 64
    [Patterns, Targets] = DataLoadAndPreprocess(handles.file);

Error in ==> gui_mainfcn at 96
        feval(varargin{:});

Error in ==> DEMO at 19
    gui_mainfcn(gui_State, varargin{:});
 
??? Error while evaluating uipushtool ClickedCallback

12 Nov 2011 belal

hello all,

I have the same problem of tabzim, that when I use the following code it returns just one feature>> could you help me if you solve your problem??

cc = cvpartition(y,'k',10);
opts = statset('display','iter');
[fs,history] = sequentialfs( @fc,X,y,'cv',cc,'direction','backward','options',opts)%,'nfeatures',5)

function [sm]=fc(Xtrain,Ytrain,Xtest,Ytest)
yy=svmclassify(svmtrain(Xtrain,Ytrain),Xtest);
sm=sum(~strcmp(Ytest,yy))
-----

so please help>>

31 Dec 2011 Ahmad Taher  
09 Jan 2012 Patrick

Dear all,
I am new to feature selection topic. I have found that this program is useful for my data. However each time i run the program, it ended up with different answers/different features. So could anybody tell me how to choose the best features for my data.

09 Jan 2012 Dimitrios Ververidis

Go to options and make confidence interval smaller. This will increase the number of cross-validation repetitions, thus execution time gets longer. The features selected will be almost identical per run. This is due to the fact that cross-validation involves a random selection of training and testing set.

01 Apr 2012 Yan

Dear Dimitrios,
I have been trying to run your code but it gives me error.
??? Reference to non-existent field 'FeatSelCurve'.

Error in ==> DEMO>OpenDataFile_ClickedCallback at 70
    axes(handles.FeatSelCurve);cla reset;

Error in ==> gui_mainfcn at 96
        feval(varargin{:});

Error in ==> DEMO at 19
    gui_mainfcn(gui_State, varargin{:});

Error in ==> DEMO>OpenDataMenu_Callback at 245
DEMO('OpenDataFile_ClickedCallback',gcbo,[],guidata(gcbo));

Error in ==> gui_mainfcn at 96
        feval(varargin{:});

Error in ==> DEMO at 19
    gui_mainfcn(gui_State, varargin{:});
 
??? Error while evaluating uimenu Callback

Can you tell me how to solve it?

24 Apr 2012 zhang yue

When the FS steps larger than 200,I came up with the Error.
??? Error using ==> delete
Invalid handle object.

Error in ==> ForwSel_main at 458
    delete( HYelLines(LinesToDeleteFeatInd));

Error in ==> DEMO>RunFeatSelection_ClickedCallback at 111
[ResultMat, ConfMatOpt, Tlapse, handles.OptimumFeatureSet,...

Error in ==> gui_mainfcn at 75
        feval(varargin{:});

Error in ==> DEMO at 19
    gui_mainfcn(gui_State, varargin{:});

??? Error while evaluating uipushtool ClickedCallback.

Thank you very much.

24 May 2012 mai

hai.. i got this problem. and i found that most of them ask the same problem but do not have any exactly answer.
TQ

??? Reference to non-existent field 'file'.

Error in ==> DEMO>RunFeatSelection_ClickedCallback at 114
[FeatureWeightsOrdered, FeaturesIndexOrdered, ...

Error in ==> gui_mainfcn at 96
        feval(varargin{:});

Error in ==> DEMO at 19
    gui_mainfcn(gui_State, varargin{:});
 
??? Error while evaluating uipushtool ClickedCallback

25 May 2012 Dimitrios Ververidis

The problem is that Matlab is not compatible with previous versions.
The problem is within 'gui_mainfcn.m'

Solution:

Backward compatabily for Matlab <7.5 for code written with >7.7 ****
          HINT: if you have Matlab 7.5 do this

        go to 'gui_mainfcn.m' to line 226 and replace
       
             226 guidemfile('restoreToolbarToolPredefinedCallback',gui_hFigure)

        with

            %----Check Matlab Version (Original on 7.7 support for 7.5 also) ---
            MatlabVersion = version;
            MatlabVersion = str2double(MatlabVersion(1:3));
           %-------------------------------------------------------------------
            if MatlabVersion >=7.7
              guidemfile('restoreToolbarToolPredefinedCallback',gui_hFigure);
            elseif MatlabVersion ==7.5
             guidemfile('restoreToolbarToolPredefinedCallback',get(gui_hFigure));
            end
          %-------------------------------------------------------------------

          with this way you have functionality for code written for either at older than 7.5 or newer than 7.7 :)

Please login to add a comment or rating.
Updates
23 Feb 2009

Latest version 5.1.5 includes also
a) an executable version.
Requirements: MSIInstaller.exe of Matlab7.5
b) A pdf describing the method

26 Feb 2009

News: Version 5.1.6 supports up to 7 classes.

07 Mar 2009

5.1.8 Help menu added
        Settings menu added

29 Aug 2010

PAMI paper accepted

Tag Activity for this File
Tag Applied By Date/Time
demo Dimitrios Ververidis 13 Feb 2009 14:14:50
gui Dimitrios Ververidis 13 Feb 2009 14:14:50
signal processing Dimitrios Ververidis 13 Feb 2009 14:14:50
image processing Dimitrios Ververidis 13 Feb 2009 14:14:50
modeling Dimitrios Ververidis 13 Feb 2009 14:14:50
gui Tshifhiwa 18 May 2010 17:14:18
demo Tshifhiwa 18 May 2010 17:14:23
demo Prem 17 Nov 2010 08:09:21
demo sonal 12 May 2012 00:32:58

Contact us at files@mathworks.com