File Exchange

image thumbnail

Feature Selection Library

version 5.1.2017 (707 KB) by

Feature Selection Library 2018 (MATLAB Toolbox)

4.95652
23 Ratings

214 Downloads

Updated

View License

Feature Selection Library (FSLib 2018) is a widely applicable MATLAB library for feature selection (attribute or variable selection), capable of reducing problem dimensionality to maximize the accuracy of data models, performance of automatic decision rules as well as to reduce data acquisition cost.
* FSLib was awarded by MATLAB in 2017 by receiving a MATLAB Central Coin.
We would greatly appreciate it if you kindly give us some feedback on this toolbox. We value your opinion and welcome your rating.
If you use this toolbox (or method included in it), please cite:
Reference article: https://link.springer.com/chapter/10.1007/978-3-319-61461-8_2
BibTex:
% ------------------------------------------------------------------------
% @InProceedings{RoffoICCV17,
% author={Giorgio Roffo and Simone Melzi and Umberto Castellani and Alessandro Vinciarelli},
% booktitle={2017 IEEE International Conference on Computer Vision (ICCV)},
% title={Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach},
% year={2017},
% month={Oct}}
% ------------------------------------------------------------------------
% ------------------------------------------------------------------------
% @InProceedings{RoffoICCV15,
% author={G. Roffo and S. Melzi and M. Cristani},
% booktitle={2015 IEEE International Conference on Computer Vision (ICCV)},
% title={Infinite Feature Selection},
% year={2015},
% pages={4202-4210},
% doi={10.1109/ICCV.2015.478},
% month={Dec}}
% ------------------------------------------------------------------------

Comments and Ratings (51)

Quoc Pham

Hello Giorgio,
How I can fix this problem (I was using ILFS)
Assignment has more non-singleton rhs dimensions than non-singleton subscripts

Giorgio

Giorgio (view profile)

Hello, ranking(1) is the most discriminative feature. Ranking(end) the worst feature. best.

Hello Giorgio,

I'd like to know how the direction in which features are ranked after calling

[ranking, weights, subset] = ILFS( X_train, Y_train , 0 );

ranking(1) is the ''most discriminant'' feature or is it ranking(end)?

Thanks in advance

Giorgio

Giorgio (view profile)

Hello,
Please make sure you've correctly compiled the toolbox:
in ./FSLib_v5.0_2017/lib/drtoolbox ---> run mexall.m

you should have this output:

>> mexall
Compiling...
Building with 'gcc'.
MEX completed successfully.
Building with 'gcc'.
MEX completed successfully.
Building with 'gcc'.
MEX completed successfully.
Building with 'g++'.
MEX completed successfully.
Compilation completed.

Error using internal.stats.parseArgs (line 42)
Wrong number of arguments.

Error in pca (line 170)
[vAlgorithm, vCentered, vEconomy, vNumComponents, vRows,vWeights,...

Error in intrinsic_dim (line 197)
[mappedX, mapping] = pca(X, size(X, 2));

Error in Untitled12 (line 39)
dd=intrinsic_dim(X, techniques{ii});

Rhys Chappell

Yaping LIN

Giorgio

Giorgio (view profile)

According to our experiments in "Infinite Latent Feature Selection: A Probabilistic Latent Graph-Based Ranking Approach" ICCV 2017, our ILFS results to be the most stable, robust supervised FS technique.
See the paper results here: https://goo.gl/WiDmu2
To perform a multiclass FS, you can always use the 1-vs-all strategy.
let's consider we have 4 classes.
We can obtain the best subsets representing each class by setting the class labels accordingly:
sub1) label 1 to class 1 label -1 to classes 2,3,4
sub2) label 1 to class 2 labels -1 to classes 1,3,4
sub3) label 1 to class 3, labels -1 to 1,2,4
sub4) label 1 to class 4, labels -1 to classes 2,3,4
After that we can extract the maximum common subset (intersect sub1 sub2 sub3 sub4).

Hello,
please suggest me the best supervised feature selection method from this tool box for high dimensional ,multiclass data(7 classes)

Hello,
Whenever I use mRMR method on the training data and training labels, MATLAB keeps crashing when I use it.
Anybody had a similar issue?

>> fea,score] = mRMR(M,L,10) %where M is the X-train, and L is the Y-train, and I want to select 10 features.
------------------------------------------------------------------------
Access violation detected at Sun Aug 13 21:38:47 2017
------------------------------------------------------------------------
.
.
.

Thank you in advance!

amr alanwar

rabbasi

Thanks for the contribution, I have a question regarding LaplacianScore. when I use your code for the example that is included in your code, regarding constructW function it gives me a warning: "Warning: This function has been changed and the Metric is no longer be supported " and in the continuation, it gives me an error in LaplacianScore(fea,W) function:
"Subscript indices must either be real positive integers or logicals.". as I said, the implementation is done on the rand data which is as an example in your code explanations. I would be thankful if you explain the reason.

Giorgio

Giorgio (view profile)

The infinite feature selection works in supervised and unsupervised manner, you can make your choice simply by setting the parameter sup=1 (for supervision) or sup=0 (unsupervised).

[ranking, w] = infFS( X_train , Y_train, alpha , sup , 0 );

X_train is a matrix, the standard "design matrix" (see Bishop, 2006) each row is a sample and each column a feature.
Y_train is a column vector of labels, if you have two classes, this vector will look like this: [1,1,1,-1,-1-1,1,-1,etc..] '
alpha is the mixing parameter, you can set alpha=0.9 usually works fine.

There's no way to know how many features to use a priori, most of the FS methods just rank the features according to their degrees of relevance, so you can select the top N features and pass them to a classifier to see how they work. If not enough you can increase N accordingly ...

The top N features are likely the ones the best contribute to the classification
Best!

Julien, I have used the ReliefF algorithm for a similar problem and it worked perfectly!
Good luck

Julien

Julien (view profile)

Hi, sorry for this very basic question. Can this method be used to identify which (visual) features best contribute to the classification of a given image in a binary classification task? Thanks . Julien

Eliya Sultan

Hi,
Does it make sense that I run the code twice over the same data and get different results?
I used the mrmr and fisher method.
thanks!
eliya.

Giorgio

Giorgio (view profile)

My comment disappeared.
SVM-RFE is SVM based, so
1) did you normalize you input data? SVMs work better and faster when inputs are in the range [0,1],
a very simple way is: X = X ./ sum(X,2)
2) check if in your Y vector you have missing labels like [ 1 1 1 1 1 3 3 3 3 4 4 4 4 5 5 5 5] <- here there's no 2, maybe this can generate an exception. in such a case you should re-assign labels as follows:
[ 1 1 1 1 1 3 3 3 3 4 4 4 4 5 5 5 5] --> [ 1 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4]

Giorgio

Giorgio (view profile)

**** sorry "check if in your Y vector you have missing labels like [ 1 1 1 1 1 3 3 3 3 4 4 4 4 5 5 5 5] <- here there's no 4, maybe this can generate an exception. "
I meant label 2 is missing... in such a case you should re-assign labels like this:
[ 1 1 1 1 1 3 3 3 3 4 4 4 4 5 5 5 5] --> [ 1 1 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4]
best,

elahe afshari

Hi Dear Giorgio Roffo

I've used svm-rfe(FSLIB) for two similar dataset(ozone & ozone2), but for ozone dataset properly run and for ozone2 dataset run it does not end.

% function SR=SVMRFEOzone(TrainData)
%% DEMO FILE
fprintf('\nFEATURE SELECTION TOOLBOX v 4.0 2016 - For Matlab \n');

% Select a feature selection method from the list
listFS = {'InfFS','ECFS','mrmr','relieff','mutinffs','fsv','laplacian','mcfs','rfe','L0','fisher','UDFS','llcfs','cfs'};

[ methodID ] = readInput( listFS );
selection_method = listFS{methodID}; % Selected
% Include dependencies
addpath('./lib'); % dependencies
addpath('./methods'); % FS methods

% number of features
% numF = size(X_train,2);
vectorvote=zeros(1,72);
TrainData=load('ozone.mat');
TrainData=TrainData.TrainData;

% feature Selection on training data

X_train=TrainData(1:201,1:72);
Y_train=TrainData(1:201,73);
ranking = spider_wrapper(X_train,Y_train,60,lower('rfe'));
rankingD=sort(ranking(1,61:72));
rankingB=sort(ranking(1,1:60));
vectorvote(1,rankingD(:))=vectorvote(1,rankingD(:))+9;
TrainData(:,rankingD(:))=[];
%--------------------------------------------------------------

X_train1=TrainData(201:400,1:60);
Y_train1=TrainData(201:400,61);
ranking1 = spider_wrapper(X_train1,Y_train1,60,lower('rfe'));
rankingD1=sort(ranking1(1,51:60));
rankingB1=sort(ranking1(1,1:50));
rankingB1=rankingB(rankingB1(:));
vectorvote(1,rankingB(rankingD1(:)))=vectorvote(1,rankingB(rankingD1(:)))+8;
TrainData(:,rankingD1(:))=[];

% --------------------------------------------------------------
X_train2=TrainData(401:600,1:50);
Y_train2=TrainData(401:600,51);
ranking2 = spider_wrapper(X_train2,Y_train2,40,lower('rfe'));
rankingD2=sort(ranking2(1,41:50));
rankingB2=sort(ranking2(1,1:40));
rankingB2=rankingB1(rankingB2(:));
vectorvote(1,rankingB1(rankingD2(:)))=vectorvote(1,rankingB1(rankingD2(:)))+7;
TrainData(:,rankingD2(:))=[];
%--------------------------------------------------------------
X_train3=TrainData(600:801,1:40);
Y_train3=TrainData(600:801,41);
ranking3 = spider_wrapper(X_train3,Y_train3,30,lower('rfe'));
rankingD3=sort(ranking3(1,31:40));
rankingB3=sort(ranking3(1,1:30));
rankingB3=rankingB2(rankingB3(:));
vectorvote(1,rankingB2(rankingD3(:)))=vectorvote(1,rankingB2(rankingD3(:)))+6;
TrainData(:,rankingD3(:))=[];
%--------------------------------------------------------------
X_train4=TrainData(802:1000,1:30);
Y_train4=TrainData(802:1000,31);
ranking4 = spider_wrapper(X_train4,Y_train4,25,lower('rfe'));
rankingD4=sort(ranking4(1,26:30));
rankingB4=sort(ranking4(1,1:25));
rankingB4=rankingB3(rankingB4(:));
vectorvote(1,rankingB3(rankingD4(:)))=vectorvote(1,rankingB3(rankingD4(:)))+5;
TrainData(:,rankingD4(:))=[];


TrainData1=load('ozone2.mat');
TrainData1=TrainData1.xc11;

% feature Selection on training data

X_train=TrainData1(1:201,1:72);
Y_train=TrainData1(1:201,73);
ranking = spider_wrapper(X_train,Y_train,60,lower('rfe'));
rankingD=sort(ranking(1,61:72));
rankingB=sort(ranking(1,1:60));
vectorvote(1,rankingD(:))=vectorvote(1,rankingD(:))+9;
TrainData(:,rankingD(:))=[];
%--------------------------------------------------------------

X_train1=TrainData1(201:400,1:60);
Y_train1=TrainData1(201:400,61);
ranking1 = spider_wrapper(X_train1,Y_train1,60,lower('rfe'));
rankingD1=sort(ranking1(1,51:60));
rankingB1=sort(ranking1(1,1:50));
rankingB1=rankingB(rankingB1(:));
vectorvote(1,rankingB(rankingD1(:)))=vectorvote(1,rankingB(rankingD1(:)))+8;
TrainData(:,rankingD1(:))=[];

% --------------------------------------------------------------
X_train2=TrainData1(401:600,1:50);
Y_train2=TrainData1(401:600,51);
ranking2 = spider_wrapper(X_train2,Y_train2,40,lower('rfe'));
rankingD2=sort(ranking2(1,41:50));
rankingB2=sort(ranking2(1,1:40));
rankingB2=rankingB1(rankingB2(:));
vectorvote(1,rankingB1(rankingD2(:)))=vectorvote(1,rankingB1(rankingD2(:)))+7;
TrainData(:,rankingD2(:))=[];
%--------------------------------------------------------------
X_train3=TrainData1(600:801,1:40);
Y_train3=TrainData1(600:801,41);
ranking3 = spider_wrapper(X_train3,Y_train3,30,lower('rfe'));
rankingD3=sort(ranking3(1,31:40));
rankingB3=sort(ranking3(1,1:30));
rankingB3=rankingB2(rankingB3(:));
vectorvote(1,rankingB2(rankingD3(:)))=vectorvote(1,rankingB2(rankingD3(:)))+6;
TrainData(:,rankingD3(:))=[];
%--------------------------------------------------------------
X_train4=TrainData1(802:1000,1:30);
Y_train4=TrainData1(802:1000,31);
ranking4 = spider_wrapper(X_train4,Y_train4,25,lower('rfe'));
rankingD4=sort(ranking4(1,26:30));
rankingB4=sort(ranking4(1,1:25));
rankingB4=rankingB3(rankingB4(:));
vectorvote(1,rankingB3(rankingD4(:)))=vectorvote(1,rankingB3(rankingD4(:)))+5;
TrainData(:,rankingD4(:))=[];



Mehdi BRAHIMI

Hi Giorgio!
Thank you very much for this toolbox !
I have one question about feature selection using fisher linear discriminent. In the paper related to the toolbox you've cited the Generelized Feature Score proposed by Gu et al. but it seems that the toolbox performs the "simple" fisher selection defined by Duda et al. 2012. Is that correct ?

Giorgio

Giorgio (view profile)

Hi! Thank you for downloading the toolbox, some techniques can provide you with a subset of features. But generally, they just perform the ranking step, in such a case you can decide how many features to select a priori or use any form of cross-validation to decide (roughly) how many of them to maintain.

Thank you very much for valuable contributions.
I have just downloaded FSLib developed by you. I have a question. After the feature ranking processes, how do we decide how many features are needed for the best accuracy?

jae baak

Forget about the comment below. Found out that Matlab Compiler isn't the only one available. This toolbox works great by the way.

jae baak

Would like to try this out but I don't have access to Matlab Compiler. Is a pre-compiled version available somewhere? (other than the dead link in the first comment) Thank you!

Great tool for FS, well documented and excellent README file with details and references for various methods.

I wish it would work for multi-class or regression based feature selection, but I will see if I can figure out a way to make it possible.

--EDIT: worth noting that the SVM tools used here "will be removed in a future release" and will need to be replaced with the updated functions at some point.

--EDIT2: Figured out how to get multi-class working for fisher at least. For others interested:
1. Turn your labels (Y) into an nxm matrix where m=number of classes and n=number of samples (with values -1 or 1 as in the Demo).
2. Modify the 'spider_wrapper' line 10: a.method='classification'; to a.method=2;%could be any number 1-3 depending on your method preference (see line 60 of FSLib_v4.2_2016/FSLib_v4.2_2016/FSLib_v4.2_2016/lib/@fisher/training.m for details)
3. Don't try to plot the results using svms, that won't work for multiclass data. Stop after feature selection (line 149 of Demo.m).

Giorgio

Giorgio (view profile)

Hi Davide, the '\' and '/' for paths are related to your operating system (Linux or Windows) so maybe you need to switch from to another based on it, you're welcome :)

Hi Giorgio, thanks for the advice. There was a \ instead of a / at line 9 of the make file file, under the path "/lib". Cheers :)

Giorgio

Giorgio (view profile)

Hi Davide, are you sure that you compiled the solution? Before using the toolbox you should run the make file, it seems a mex file is missing, also check to include into the path the FSLib folder and its subfolders.

ranking = spider_wrapper(X,Y,N_DS,lower('rfe')) triggers this stack error trace:

Undefined function 'libsvm_classifier_spider' for input arguments of type 'cell'.

Error in svm/training (line 225)
[alpha,xSV,bias0]=libsvm_classifier_spider({'X',x},{'Y',y}, ...

Error in algorithm/train (line 103)
[dat,algor]=training(algo,dat);

Error in rfe/training (line 36)
[res,a.child]=train(untrained,dat);

Error in algorithm/train (line 103)
[dat,algor]=training(algo,dat);

Error in spider_wrapper (line 12)
[tr,a]=train(a,dset);

Can you suggest me how to solve it?

Giorgio

Giorgio (view profile)

Hi David, the Inf-FS is an unsupervised method, however within this toolbox you can find two variants of this method.
function [RANKED, WEIGHT] = infFS( X_train, Y_train, alpha, supervision, verbose )

Set supervision=0 to use the unsupervised version, in such a case you can set Y_train=[]

If you want to use the supervised version then set supervision=1 and provide the right class labels.

Thank you for your question,
Hope this helps!

Hi, one question regarding the infFS algorithm. If it is an unsupervised feature selection method, why do you have to provide a Y_train vector with the class labels?

nVIDIApascal

YJXia

YJXia (view profile)

Helmie Hamid

yiting yang

Afsoon

Afsoon (view profile)

Does this toolbox work on multiclass data? I have data with 1320 features and 3 class and I need feature selection stage. But this toolbox can't work for my data.

Andrea

Andrea (view profile)

Lory Pent

I found FSLib to be a very useful tool! Many thanks to the developer!

David

David (view profile)

Giorgio

Giorgio (view profile)

Hi, thank you for your feedback. I added the "Feature Selection Library (MATLAB Toolbox)" paper in the zip. It discusses the most important FS methods. As for the usability, you can find a Demo file in the toolbox which is designed to allow users to run the code easily and use every method provided with the toolbox.

Hi, Your toolbox seems very interesting, but do you provide any 'readme' file or any other documentation to learn how to use it?
Thank you very much!

Excellent! Thank you!

Hi Thanks so much for the library. I have a question. For the method Fisher[6], does the library implement the Generalized Fisher Score proposed by Quanquan Gu etc 2012, (which is more complicated), or just the classical Fisher Score. Thank you.

frankjk

huang hai

Giorgio

Giorgio (view profile)

Updates

5.1.2017

+ Solved problems with drtoolbox

5.0.2017

- ILFS 2017
Infinite Latent Feature Selection ICCV 2017
- Dimensionality reduction tools
- Measurements for Intrinsic Dimensionality
+ CorrDim
+ NearNbDim
+ GMST
+ PackingNumbers
+ EigValue
+ MLE

4.3.2017

Reference article: https://link.springer.com/chapter/10.1007/978-3-319-61461-8_2

4.2.2017

- Documentation

4.2.17

+ Information added in Demo.m about how to compile the solution.

4.2

+ Added Documentation

4.1.1

+ Toolbox Version 4.1 - 11 December 2016

4.1

+ Documentation : Feature Selection Library (MATLAB Toolbox)

4.0

New Unsupervised Methods added:
FEATURE SELECTION TOOLBOX v 4.0 2016 - For Matlab
[1] InfFS
[2] ECFS
[3] mrmr
[4] relieff
[5] mutinffs
[6] fsv
[7] laplacian
[8] mcfs
[9] rfe
[10] L0
[11] fisher
[12] UDFS
[13] llcfs
[14] cfs

3.0

EC-FS added: Feature Selection via Eigenvector Centrality, 2016

3.0

- Added new method: Features Selection via Eigenvector Centrality (ECFS) 2016
- Updated the Infinite Feature Selection (InfFS) - Strong improvments on ranking accuracy 2016

2.2

new features added

2.1.1

Feature selection using Matlab. Feature selection Matlab code

2.1.1

FEATURE SELECTION TOOLBOX - For Matlab
Methods and Refs included
[1] mrmr
[2] inffs
[3] relieff
[4] mutinffs
[5] fsv
[6] laplacian
[7] mcfs
[8] rfe
[9] L0
[10] fisher

1.1

- make

MATLAB Release
MATLAB 8.6 (R2015b)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

FSLib_v5.1_2017/

FSLib_v5.1_2017/eval_metrics/

FSLib_v5.1_2017/lib/@algorithm/

FSLib_v5.1_2017/lib/@data/

FSLib_v5.1_2017/lib/@distance/

FSLib_v5.1_2017/lib/@fisher/

FSLib_v5.1_2017/lib/@kernel/

FSLib_v5.1_2017/lib/@l0/

FSLib_v5.1_2017/lib/@loss/

FSLib_v5.1_2017/lib/@rfe/

FSLib_v5.1_2017/lib/@svm/

FSLib_v5.1_2017/lib/

FSLib_v5.1_2017/lib/drtoolbox/

FSLib_v5.1_2017/lib/drtoolbox/gui/

FSLib_v5.1_2017/lib/drtoolbox/techniques/

FSLib_v5.1_2017/methods/