Constrained Global Optimization Problem with MultiStart, GA and HybridFunction using Parallel Processing
22 views (last 30 days)
Mateusz Malinowski on 4 Apr 2016
I am interested in using a genetic algorithm approach to fitting measurement data to a function of 3 variables with 4 unknown coefficients.
I am not concerned with the computational time required for the genetic approach, as, for now, I am just trying to develop a methodology for fitting complex, non-linear/non-smooth, functions using the various features available in Matlab's Global Optimization toolbox. The complexity and non-linearity of my function (presented below) will greatly increase in the near future.
The function which I am trying to fit my data to is defined as following:
With: a1,a2,a3,a4 being the regression model's unknown coefficients
The following optimization constraints need to be imposed on the coefficients of the function:
a3<0; a4>0; a1*a2<0;
The data to which I am trying to fit my function to is presented in the attached SampleData.csv file. The file contains column labels which define the measured value and x,y,z parameters.
My initial guesses for the values of a1,a2,a3,a4 are defined as a row vector: [0,0,0,0].
I would like to learn how to set this problem up using MultiStart and parallel processing. I would like to use parallel processing for the MultiStart process, as well as to include a Hybrid Function, if such an operation is possible.
The work flow which I would like to achieve should be similar to:
1. Define Function (referring to x,y,z data contained in columns of SampleData.csv file) :
function [ Value ] = FittingFunction(x,data)
I would prefer to not use an anonymous function as the complexity of the model to which I will be fitting my future data to will only increase and an anonymous function would be messy to adjust as the regression model matures.
2.Define Initial Guess Row Vector x=[0,0,0,0]
3.Define optimization problem (referring to value data & x,y,z data contained in columns of SampleData.csv file) as a minimization problem trying to minimize a cost function defined as:
4. Pass optimization problem to a Genetic Algorithm routine with a hybrid function included using fminunc.
5. Create a MultiStart Optimization Object and Start a pool of workers.
lowerBounds for MultiStart =[-Inf,-Inf,-Inf,-Inf]; upperBounds for MultiStart =[Inf,Inf,Inf,Inf];
Number of MultiStart Iterations= 5.
5. Run optimization in parallel.
I realize that the problem which I defined is quite complex in its definition. I am not an expert user of Matlab and may not realize that my approach to solving this problem and the requirements (using the genetic algorithm, parallel processing, constraining coefficient ranges, and doing all of this using a MultiStart approach) for the fitting process may be unachievable.
This is my current status:
- I do not know how to set up the fitting routine to use the Genetic Algorithm; my code is currently using lsqcurvefit, because this was the only way that I could figure out how to instantiate the matlab optimization problem object using the createOptimProblem() constructor.
- I understand that lsqcurvefit cannot perform constrained optimization, but the Genetic Algorithm approach can do that; any help with setting up this problem using the genetic approach would be greatly appreciated, even if it is without the use of MultiStart for now.
- As I am currently not using the Genetic Algorithm, I am not forcing the optimization solver to use the 'HybridFunction','on' option.
- I am aware of the fact that the Global Optimization Toolbox has two options for finding global minima: GlobalSearch and MultiStart. I have decided to use MultiStart because according to the following article: http://www.mathworks.com/help/gads/how-globalsearch-and-multistart-work.html#bsc9eec MultiStart Can be setup in parallel on a multicore processor, while GlobalSearch cannot.
- I am not sure if MultiStart or GlobalSearch can be setup to run with a Genetic Algorithm...
Here is my current code (requires the data.mat file attached to my posting):
%%Loading Measurement Data File
%%Defining function to which the data will be fit to
% I would like to move away from using the anonymous function,
% and move to using the function file defined in the directory but I don't
% know how to pass a function file to the createOptimProblem function call.
fitfcn1 = @(x,data)x(1).*data.x+x(2).*log(data.y)+abs(data.z).*x(3)+x(4);
%%Defining lower and upper bounds for MultiStart procedure
lb1 = [-Inf,-Inf,-Inf,-Inf];
ub1 = [Inf,Inf,Inf,Inf];
%%Creating row vector for intial guess to optimization routine
p01 = 0*ones(1,4);
%%Defining optimization problem object
% Currently running lsqcurvefit, but I need to move to the genetic
% algorythm approach. I do not understand how to setup a genetic alorythm
% data fitting routine in matlab using the GA functionality built into the
% global optimization toolbox.
problem1 = createOptimProblem('lsqcurvefit','x0',p01,'objective',fitfcn1,...
%%Creating live fitting progress plot
ms1 = MultiStart('PlotFcns',@gsplotbestf);
%%Running MultiStart Optimization Routine
[xmulti1,errormulti1] = run(ms1,problem1,5)
I would greatly appreciate any sort of guidance in solving this problem.
Alexander Andreychenko on 21 Apr 2016
There is the way to re-formulate the problem such that it fits nicely to MultiStart algorithm (for instance).
The goal function could be the following:
function [diff] = fitfcn2(x,data)
approximation = x(1).*data.x + x(2).*log(data.y) + abs(data.z).*x(3) + x(4);
tempDiff = sum(data.value - approximation);
diff = tempDiff * tempDiff;
And the code to run optimization itself (MultiStart with 10 initial points)
funToMinimize = @(x) fitfcn2(x,data);
lb1 = ;
ub1 = ;
% Creating row vector for intial guess to optimization routine
initialParams = 0*ones(1,4);
% Creating optimization options
mleFminConOptions = optimoptions('fmincon');
mleFminConOptions = optimoptions(mleFminConOptions, 'Display','notify-detailed');
mleFminConOptions = optimoptions(mleFminConOptions, 'UseParallel', true);
% Creating the optimization problem
problem = createOptimProblem('fmincon', 'objective', funToMinimize, 'x0', initialParams, 'lb', lb1, 'ub', ub1, 'options', mleFminConOptions);
ms = MultiStart('UseParallel',true, 'Display','iter', 'StartPointsToRun', 'bounds-ineqs');
numberOfInitialPoints = 10;
% Running MultiStart Optimization Routine
[xmulti1,errormulti1] = run(ms, problem, numberOfInitialPoints);
More Answers (2)
Alan Weiss on 4 Apr 2016
I think that it is a good idea to use MultiStart to help find a global best fit for your model. However, I think that it is a poor idea to use the genetic algorithm to help solve this problem. What is the point in using a slow, less reliable algorithm? Use MultiStart with lsqcurvefit or lsqnonlin when you don't have constraints or have only bound constraints, and use MultiStart with fmincon when you have more general constraints. To see how to set up your objective function, consult Nonlinear Data-Fitting, which does not discuss constraints, but you can throw those in the problem separately.
MATLAB mathematical toolbox documentation
Alex Sha on 7 Jan 2020
the best results:
Root of Mean Square Error (RMSE): 69.3464929074478
Sum of Squared Residual: 577072.329427525
Correlation Coef. (R): 0.965351307958851
Adjusted R-Square: 0.929534561613616
Determination Coef. (DC): 0.931903147777864
Constrained Functions: a1*a2-0 = -8.41181835417199E-13
Parameter Best Estimate