variableIndices variable in global optimization toolbox GA

Hi !
I have a question about a variable from global optimization toolbox genetic algorithm subroutine.
I am attempting to minimize a fitness function (RMSEP - Relative Mean Square Error of Prediction from Partial Least Squares) by selecting optimal subsets of my initial dataset, with a maximum of 6 variables in a subset. Basically I want to perform GA-PLS with the global optimization toolbox.
My fitness function gets processed normally, and it has as an output a single scalar, rmsep value.
However, in order to select the variables for the subset, naturally, I would think I have to use the variableIndices which are output of the GA. However, whichever creation function I use, the results are always in decimal form (e.g. 20.250...) and thus unusable for selection of subsets. I can use round(variableIndices), however I think that is not the solution.
My fitness function is:
function rmsep = gafit1(variableIndices,...
x_train,y_train,x_test,y_test)
% variableIndices=round(variableIndices)
y1=y_train;
x1=x_train;
[~,~,~,~,b] = plsregress(x1(:,variableIndices),y1,2);
y2=y_test;
x2=x_test(:,variableIndices);
yhat = [ones(size(x2,1),1) x2]*b;
rmsep=gafit0(y2,yhat)*100;
plot(y2,yhat,'.');
end
gafit0 is a routine which calculates rmsep.
Please help !!!
Best regards,
Petar

Answers (1)

Use mixed-integer optimization along with the constraint that no more than 6 variables can exist, a linear constraint of the form
A*x <= 6
where x is a binary vector of the variables in the problem and A is a row vector of ones. If you have more variables than just the binary ones, put zeros in the corresponding columns of A for those variables.
Alan Weiss
MATLAB mathematical toolbox documentation

6 Comments

Thank you for your prompt answer Alan.
However, please clarify something for me.
The binary vector x is returned by the GA or user-defined?
Also, in my problem I would like to use variable indices, not binary codes.
Moreover, I have to be doing something wrong, when I input zeros(1,6) or ones(1,6) as A, and 6 as b as the linear inequalities, I still get real values as variableIndices. Example:
variableIndices =
-0.2416 -1.0691 0.5166 0.6092 0.1950 0.9878
After doing some searching, an old Matlab Digest article by Sam Reyes popped up (Using Genetic Algorithms to Select a Subset of Predictive Variables from a High-Dimensional Microarray Dataset) in which he has achieved my goal for classification purposes, and I have managed to reproduce it for regression.
However, he uses custom creation, crossover and mutation functions. The emphasis here is on the custom creation function in which the variable indices are defined as:
variables = floor(rand(numSelectedVariables,1)*numVariables)+1;
I would like to achieve the same with the built-in ones.
Thank you once again !
Best regards,
Petar
P.S. When I use [1,2,3,4,5,6] as integer variable indices, and either zeros(1,6), or ones(1,6) as A, and 6 as b, I encounter another problem. The indices are sometimes negative values, or larger than number of original variables (124), even if I put bounds (lower=1, and upper=124). Thus, error is either:
-----------------------------
Optimization running.
Error running optimization.
Subscript indices must either be real positive integers or logicals.
or:
-----------------------------
Optimization running.
Error running optimization.
Index exceeds matrix dimensions.
Update:
I changed the bounds to: lower: [1,1,1,1,1,1], and upper [124,124,124,124,124,124] and now it is running well. But, still I can't use different creation, crossover, and mutation functions which is a problem. I would like to set up a design using different crossover, mutation, and selection functions. 3^3 = 27 combinations.
Also, there are repeated variables, which mustn't happen:
3 75 22 8 67 33 62 3
Please read the material in the mixed integer optimization link that I gave in my first reply.
After you read that material, then consider reformulating your problem so that you have more variables, some of which will be binary. The binary variables indicate whether a real variable is in the set or not. You will have a restriction that the sum of these binary variables is no more than 6.
How do you create such variables? Suppose that your real variables are x(1) through x(100), and these variables can take positive values between 0 and 50. Make new binary variables y(1) through y(100). These have lower bound 0, upper bound 1, and are integer valued. Make new constraints:
y(i) <= x(i) <= 50*y(i)
The new constraints ensure that y = 0 whenever x = 0, and that y = 1 whenever x > 0.
Now you have twice as many variables. Your decision variables are [x,y], a 200-long row vector. When you include constraints, make sure that your constraints are valid for the [x,y] vector.
You will need to make a larger-than-default population for GA to have a chance at solving this problem correctly.
Alan Weiss
MATLAB mathematical toolbox documentation
So the way described in my previous comment is wrong?
Now I should create a nonlinear constraint function?
It has become very confusing...
I am suggesting that your previous formulation was inadequate. You want to choose some variables for your problem, and you need to ensure that you don't end up with repeated variables. I was suggesting a method of formulating your problem that incorporates those conditions. The steps are:
  1. Let x(i) be the values of your decision variables, which I assume are bounded between 0 and M, where M is a fixed positive real number
  2. Let y(i) be binary variables that are 1 whenever x(i) is positive, and 0 otherwise.
  3. Incorporate linear inequalities y(i) <= x(i) and x(i) <= M*y(i). These inequalities ensure that x and y are linked.
  4. Incorporate the inequality sum_i y(i) <= 6 to keep the number of active variables below your threshold.
Does this make sense? It might seem complicated, but it is the simplest way I know for ensuring that the problem you solve is meaningful.
Alan Weiss
MATLAB mathematical toolbox documentation
It seems that a constraint function needs to be coded. I shall study through the documentation on how to do so, and attempt to do it according to your suggestion.
However, will this now allow for use of different selection/crossover/mutation functions ?
Could my constraint function be something like this?
function [c,ceq] = gaconstraint(x)
% Preload y
global y
% Number of selected variables
numSelVariables=size(x,2);
for i = 1:numSelVariables
if x(i) > 0
% y=1 if x is positive
y(i)=true;
elseif x(i) < 0;
% y=0 if x is negative
y(i)=false;
else
end
end
% Three inequalities: y(i) <= x(i), x(i) <= numSelVariables*y(i), and
% sum(y(i)) <= 6
c = [y(i)-x(i); x(i)-numSelVariables*y(i); sum(y(i))-6];
% No linear equalities
ceq = [];
end
y is input from ga and is a row vector with six variables. Right?
Also how could my fitness function be modified to use with Particle Swarm Optimization Toolbox which requires a function handle and doesn't accept a variable of type 'cell'. The 'cell' variable in which I pass the function handle to GA is:
F={@gafit1,x_train,y_train,x_test,y_test};
Moreover, even if I manage to call it by loading the original matrices within the function file, and have only one input variable, now I face the same problem as with GA, and I don't seem to see the option to define integer variable indices for PSO.
Once more I urge you to read in its entirety the first section I linked you to on mixed integer optimization. Please read it before asking any more questions. In particular, it states that you cannot have custom mutation or crossover functions. Yes, it is a pain and boring to read the documentation. But this particular documentation is quite relevant to what you are trying to do.
Also, do NOT use nonlinear constraints for the x and y variables, use LINEAR inequality constraints.
Alan Weiss
MATLAB mathematical toolbox documentation

Sign in to comment.

Asked:

on 18 Nov 2014

Commented:

on 20 Nov 2014

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!