Description |
Techniques from information theory are usual in selecting variables in time series prediction or pattern recognition. These tasks involve, directly or indirectly, the maximization of the mutual information between input and output data. However, this procedure requires a high computational effort, due to the calculation of the joint entropy, which requires the estimation of the joint probability distributions. To avoid this computational effort, it is possible to apply variable selection based on the principle of minimum-redundancy/maximum-relevance, which maximizes the mutual information indirectly, with lower computational cost. However, the problem of combinatorial optimization, i.e. to check all possible combinations of variables, still represents a large computational effort. Due to this computational cost, a simple method of incremental search, that reaches a quasi-optimal solution, was proposed by some previous works. Given the limitations of the existing methods, this code was developed, in order to perform the combinatorial optimization by using Genetic Algorithms. The arguments are the desired number of selected features (feat_numb), a matrix X, in which each column is a feature vector example, and its respective target data y, which is a row vector. The output is a vector with the indexes of the features that composes the optimum feature set, in which the order of features has NO relation with their importance. In case of publication, please cite the original work: O. Ludwig and U. Nunes; “ Novel Maximum-Margin Training Algorithms for Supervised Neural Networks;” IEEE Transactions on Neural Networks, vol.21, issue 6, pp. 972-984, Jun. 2010, where this algorithm is applied in choosing the hidden neurons to compose a hybrid neural network named ASNN. |