File Exchange

image thumbnail

Feature selector based on genetic algorithms and information theory.

version 1.1 (3.16 KB) by

The algorithm performs the combinatorial optimization by using Genetic Algorithms.

3.42857
9 Ratings

18 Downloads

Updated

View License

Techniques from information theory are usual in selecting variables in time series prediction or pattern recognition. These tasks involve, directly or indirectly, the maximization of the mutual information between input and output data. However, this procedure requires a high computational effort, due to the calculation of the joint entropy, which requires the estimation of the joint probability distributions. To avoid this computational effort, it is possible to apply variable selection based on the principle of minimum-redundancy/maximum-relevance, which maximizes the mutual information indirectly, with lower computational cost. However, the problem of combinatorial optimization, i.e. to check all possible combinations of variables, still represents a large computational effort. Due to this computational cost, a simple method of incremental search, that reaches a quasi-optimal solution, was proposed by some previous works. Given the limitations of the existing methods, this code was developed, in order to perform the combinatorial optimization by using Genetic Algorithms. The arguments are the desired number of selected features (feat_numb), a matrix X, in which each column is a feature vector example, and its respective target data y, which is a row vector. The output is a vector with the indexes of the features that composes the optimum feature set, in which the order of features has NO relation with their importance. In case of publication, please cite the original work: O. Ludwig and U. Nunes; “ Novel Maximum-Margin Training Algorithms for Supervised Neural Networks;” IEEE Transactions on Neural Networks, vol.21, issue 6, pp. 972-984, Jun. 2010, where this algorithm is applied in choosing the hidden neurons to compose a hybrid neural network named ASNN.

Comments and Ratings (20)

Manu

Manu (view profile)

joy free

abdala nour

Hi Sir
Could you please send and example how I can use this function.
Thank you

Dear Oswaldo,

I have some confusions regarding the inputs.
Here, every row of X is a particular feature subset. But it is not clear to me what the other input 'respective target data y' means.

Can you please add some details to this?

Jack

Jack (view profile)

Dear Oswaldo,

First of all,thank you so much for providing this effective code.

when i try to perform features selection using this code i call it as following:

global x
global y
x=matrix of features;
y=labels; % maybe the problem is here
feat_numb=20;
[Selected]=GA_feature_selector(feat_numb,x,y);

and then an error come out as following:

Error using vertcat
CAT arguments dimensions are not consistent.

Error in statistics (line 9)
Hy=entropia2([y;zeros(1,C)],15);

Error in GA_feature_selector (line 19)
[Hx,Hy,MIxy,MIxx]=statistics(X,y);

Actually, i tried to select a sub set of features i have (x) which is 40*28 matrix where each row represent a vector. and i have y labels whcih is 40 *1 matrix where each row is a target label for each vector.

Can you please tell me why the code not work, and where is the problem, and i will be appreciate that.

Thank you in advance

arshi

arshi (view profile)

Dear Oswaldo,
can you please give the description of the above code. Its very difficult to understand the meaning of each line.Moreover various notations are difficult to be guessed.

Oswaldo Ludwig

Oswaldo Ludwig (view profile)

Dear Arshi,

Pressão means pressure in my mother tongue, you can set the selective pressure of the GA through this variable, see Equation (10) of:
https://www.researchgate.net/publication/235687343_Improving_the_Generalization_Capacity_of_Cascade_Classifiers

arshi

arshi (view profile)

Dear Oswaldo,

Can you please explain the meaning of 'Pressao'......There are no of variables whoz meaning is difficult to be guessed....So can you please provide an algo of the code.

Oswaldo Ludwig

Oswaldo Ludwig (view profile)

Mahyar,

I'm sorry you aren't able to read/interpret the file description: "... The arguments are the desired number of selected features (feat_numb), a matrix X, in which each column is a feature vector example...".

Mahyar

Mahyar (view profile)

Dear Oswaldo
what is the meaning "15" in Hy=entropia2([y;zeros(1,C)],15)?
Moreover, the Y dimension is not matched with zeros(1,C). because the Dimension of C is equal with number of features whereas the dimension of y is equal the number of input pairs.
So, there is a mismatch dimension to vertcat!
How we can solve this problem?

Dmitry Kaplan

Can you please explain the meaning of the resolucao=15. Why 15?

Nermine

if u please send a description for the used GA method,thnx

Alsam

Alsam (view profile)

Alsam

Alsam (view profile)

sam

sam (view profile)

Ali, use transposed X n y, i.e. X',y'.. hope it helps

ali Abusnina

Hi

I am facing difficulty using the code. I am running Matlab 7, on Mac OS. When i call the function I get the following error :

"
??? Error using ==> vertcat
CAT arguments dimensions are not consistent.
Error in ==> statistics at 9
Hy=entropia2([y;zeros(1,C)],15);
Error in ==> GA_feature_selector at 19
[Hx,Hy,MIxy,MIxx]=statistics(X,y);
"

Can anyone help please

Thanks

rekoba

rekoba (view profile)

thanks for your code
but plz write an example to run this code
thanks

Oswaldo Ludwig

Oswaldo Ludwig (view profile)

Dear Mohamed,

The approach depends on your application, in the case of object detection, the usual approach is to use an image descriptor, eg HOG (see http://www.mathworks.com/matlabcentral/fileexchange/28689-hog-descriptor-for-matlab), before the feature selection.

Peer Mohamed

How to use this function for images

CarloG

CarloG (view profile)

Updates

1.1

Only the description.

MATLAB Release
MATLAB 7 (R14)

Download apps, toolboxes, and other File Exchange content using Add-On Explorer in MATLAB.

» Watch video

Win prizes and improve your MATLAB skills

Play today