variableIndices variable in global optimization toolbox GA
Show older comments
Hi !
I have a question about a variable from global optimization toolbox genetic algorithm subroutine.
I am attempting to minimize a fitness function (RMSEP - Relative Mean Square Error of Prediction from Partial Least Squares) by selecting optimal subsets of my initial dataset, with a maximum of 6 variables in a subset. Basically I want to perform GA-PLS with the global optimization toolbox.
My fitness function gets processed normally, and it has as an output a single scalar, rmsep value.
However, in order to select the variables for the subset, naturally, I would think I have to use the variableIndices which are output of the GA. However, whichever creation function I use, the results are always in decimal form (e.g. 20.250...) and thus unusable for selection of subsets. I can use round(variableIndices), however I think that is not the solution.
My fitness function is:
function rmsep = gafit1(variableIndices,...
x_train,y_train,x_test,y_test)
% variableIndices=round(variableIndices)
y1=y_train;
x1=x_train;
[~,~,~,~,b] = plsregress(x1(:,variableIndices),y1,2);
y2=y_test;
x2=x_test(:,variableIndices);
yhat = [ones(size(x2,1),1) x2]*b;
rmsep=gafit0(y2,yhat)*100;
plot(y2,yhat,'.');
end
gafit0 is a routine which calculates rmsep.
Please help !!!
Best regards,
Petar
Answers (1)
Alan Weiss
on 18 Nov 2014
Use mixed-integer optimization along with the constraint that no more than 6 variables can exist, a linear constraint of the form
A*x <= 6
where x is a binary vector of the variables in the problem and A is a row vector of ones. If you have more variables than just the binary ones, put zeros in the corresponding columns of A for those variables.
Alan Weiss
MATLAB mathematical toolbox documentation
6 Comments
Alan Weiss
on 19 Nov 2014
Please read the material in the mixed integer optimization link that I gave in my first reply.
After you read that material, then consider reformulating your problem so that you have more variables, some of which will be binary. The binary variables indicate whether a real variable is in the set or not. You will have a restriction that the sum of these binary variables is no more than 6.
How do you create such variables? Suppose that your real variables are x(1) through x(100), and these variables can take positive values between 0 and 50. Make new binary variables y(1) through y(100). These have lower bound 0, upper bound 1, and are integer valued. Make new constraints:
y(i) <= x(i) <= 50*y(i)
The new constraints ensure that y = 0 whenever x = 0, and that y = 1 whenever x > 0.
Now you have twice as many variables. Your decision variables are [x,y], a 200-long row vector. When you include constraints, make sure that your constraints are valid for the [x,y] vector.
You will need to make a larger-than-default population for GA to have a chance at solving this problem correctly.
Alan Weiss
MATLAB mathematical toolbox documentation
Petar
on 19 Nov 2014
Alan Weiss
on 19 Nov 2014
I am suggesting that your previous formulation was inadequate. You want to choose some variables for your problem, and you need to ensure that you don't end up with repeated variables. I was suggesting a method of formulating your problem that incorporates those conditions. The steps are:
- Let x(i) be the values of your decision variables, which I assume are bounded between 0 and M, where M is a fixed positive real number
- Let y(i) be binary variables that are 1 whenever x(i) is positive, and 0 otherwise.
- Incorporate linear inequalities y(i) <= x(i) and x(i) <= M*y(i). These inequalities ensure that x and y are linked.
- Incorporate the inequality sum_i y(i) <= 6 to keep the number of active variables below your threshold.
Does this make sense? It might seem complicated, but it is the simplest way I know for ensuring that the problem you solve is meaningful.
Alan Weiss
MATLAB mathematical toolbox documentation
Alan Weiss
on 20 Nov 2014
Once more I urge you to read in its entirety the first section I linked you to on mixed integer optimization. Please read it before asking any more questions. In particular, it states that you cannot have custom mutation or crossover functions. Yes, it is a pain and boring to read the documentation. But this particular documentation is quite relevant to what you are trying to do.
Also, do NOT use nonlinear constraints for the x and y variables, use LINEAR inequality constraints.
Alan Weiss
MATLAB mathematical toolbox documentation
Categories
Find more on Genetic Algorithm in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!