Choosing Initial values for non-linear fitting and running Genetic Algorithm

6 views (last 30 days)
Note: This is a question based on suggestions/answers based on another question I had asked: link, but now a slightly different one.
Here is a summary: I am running an optics to calibrate an optical device. For this I need to fit a curve between two variables, voltage and phase. There are 7 different parameters
F= @(a,V) a(1).*(a(2) - a(3) - (a(2).*a(4))./(a(2).^2.*cos(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2 + a(4).^2.*sin(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2).^(1/2))
This is the best fit I could arrive at but however would prefer something that fits even better. [Atleast from x axis starting from 0.8 Volts]
I had some struggles to find the starting point of the parameters for such problems. It was suggested that I use either a global optimization algorithm like (Genetic Algorithm etc) to get some good starting values. However, I when I run the genetic algorithm, the program keeps running and doesn't stop at all. (I ran it for over 15 hours). Any suggestions on where things could have gone wrong? Also, is there any way I can keep better improving the parameter values that the algorithm predicts on loop?
Here are the constrains on the parameter
  • a(1): (about 20000, 80000) - This is the ratio of the thickness of a crystal and the wavelength of the laser we are using.
  • a(2) and a(4): (about 1.4-1.8) - Both these are refractive indices.
  • No strict range on a(3)
  • a(6) : Around (0.3-0.9) - These are some voltages in V.
  • a(7): Around (0.3-2)- Again some voltage in V.
  • a(5)- No strict range on this too
I am running a huge code but here is a part of the script which uses the GA or Multi-Start algorithm:
%Some nonsense that You might not need
function [volt,inten] = IntVoltFit(volt,inten,Phase)
syms G B net n n_e d z V1 Vo Vc b A n_c n_del p k
inten=inten/max(inten);
inten=inten-min(inten);
theta=b-(2*atan(exp(-((V1-Vc)/Vo))))
net=(n_e*n)/(sqrt((n*cos(theta))^2+(n_e*sin(theta))^2));
T = 1-(cos(d*(n-net-n_del)));
delta=d*(n-net-n_del+n_c)
sympref('PolynomialDisplayStyle','ascend');
latex(taylor(delta,V1,'ExpansionPoint',2,'Order',1))
b=[-26.05,20,-1.456,0.179,1.594,2.795];
Fit=@(b,V) b(1)+(b(2).*atan(b(3).*V+b(4).*V.^3+b(5))+b(6).*V);
latex(taylor(Fit(b,V1),V1,'ExpansionPoint',2,'Order',2))
%PROBABLY the relevant part for this question
V=volt
F= @(a,V) a(1).*(a(2) - a(3) - (a(2).*a(4))./(a(2).^2.*cos(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2 + a(4).^2.*sin(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2).^(1/2))
ftns= @(a_fit) norm(F(a_fit,V)-Phase)
[a_fit]=gamultiobj(ftns,7, [],[],[],[],[-Inf;-Inf;-Inf;-Inf;-Inf;-Inf;-Inf],[Inf;Inf;Inf;Inf;Inf;Inf;Inf]);
fprintf(1,'\tRate Constants:\n')
for k1 = 1:numel(a_fit)
fprintf(1, '\t\ta(%2d) = %8.5f\n', k1, a_fit(k1))
end
figure
plot(V, Phase, '.b')
hold on
plot(V, F(a_fit,V), '-r')
hold off
grid
end

Accepted Answer

Star Strider
Star Strider on 26 Jul 2023
V = readmatrix('volt.csv');
Phase = readmatrix('phase.csv');
F= @(a,V) a(1).*(a(2) - a(3) - (a(2).*a(4))./(a(2).^2.*cos(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2 + a(4).^2.*sin(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2).^(1/2));
ftns = @(a) norm(Phase - F(a,V))
ftns = function_handle with value:
@(a)norm(Phase-F(a,V))
PopSz = 100;
Parms = 7;
optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
Start Time: 2023-07-26 12:18:52.5968
[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);
Optimization terminated: maximum number of generations exceeded.
t1 = clock;
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
Stop Time: 2023-07-26 12:19:04.6864
GA_Time = etime(t1,t0)
GA_Time = 12.0896
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
Elapsed Time: 1.208955100000000E+01 00:00:12.089
fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)
Fitness value at convergence = 1.1617 Generations = 5000
fprintf(1,'\tRate Constants:\n')
Rate Constants:
for k1 = 1:length(theta)
fprintf(1, '\t\tTheta(%2d) = %8.5f\n', k1, theta(k1))
end
Theta( 1) = 1.37342 Theta( 2) = 7.55204 Theta( 3) = 2.10657 Theta( 4) = 0.69253 Theta( 5) = 6.88245 Theta( 6) = 3.57480 Theta( 7) = 4.58644
figure
plot(V, Phase, '.b')
hold on
plot(V, F(theta,V), '-r')
hold off
grid
xlabel('V')
ylabel('Phase')
title('GA Parameters')
Fmdl = fitnlm(V, Phase, F, theta)
Warning: The Jacobian at the solution is ill-conditioned, and some model parameters may not be estimated well (they are not identifiable). Use caution in making predictions.
Fmdl =
Nonlinear regression model: y ~ F(a,V) Estimated Coefficients: Estimate SE tStat pValue ________ _______ _________ __________ a1 2.5732 6214.1 0.0004141 0.99967 a2 6.6979 16175 0.0004141 0.99967 a3 1.1205 2705.8 0.0004141 0.99967 a4 3.0875 7455.8 0.0004141 0.99967 a5 6.0226 0.35322 17.051 1.2351e-34 a6 0.25671 0.32998 0.77796 0.43804 a7 1.3358 0.60774 2.198 0.029762 Number of observations: 134, Error degrees of freedom: 127 Root Mean Squared Error: 0.0556 R-Squared: 1, Adjusted R-Squared 1 F-statistic vs. constant model: 5.73e+04, p-value = 2.24e-215
theta = Fmdl.Coefficients.Estimate;
figure
plot(V, Phase, '.b')
hold on
plot(V, F(theta,V), '-r')
hold off
grid
xlabel('V')
ylabel('Phase')
title('‘fitnlm’ Using GA Parameters as Initial Estimates')
Although the model fits well, it appears to not be appropriate to the data. Specifically, ‘a1’ through ‘a4’ do not appear to be required for the model (the p values are near unity), and only ‘a5’ and ‘a7’ are significantly different from zero.
.
  4 Comments
Akaash Srikanth
Akaash Srikanth on 26 Jul 2023
At this point I have tried it over 20 times and I still keep getting similar results as to what I had gotten. Fits no where near what you had obtained. Is this normal? Should I keep storing the values in the best fit in some vector and repeat it?
Also, there is a small edit in the previous comment. Any help regarding that should also be helpful.
Thanks,
Akaash
Alex Sha
Alex Sha on 27 Jul 2023
If taking parameter ranges as:
a(1): (about 20000, 80000)
a(2) and a(4): (about 1.4-1.8)
No strict range on a(3)
a(6) : Around (0.3-0.9)
a(7): Around (0.3-2)
a(5)- No strict range on this too
The best result will be as below. Note the parameter values of a1, a2, a4 and a6 are all in the lower bound of ranges, it means the lower bound values for those parameters are not appropriate or need to be adjusted.
Sum Squared Error (SSE): 0.913188313696523
Root of Mean Square Error (RMSE): 0.082552033057426
Correlation Coef. (R): 0.999569920444164
R-Square: 0.999140025856752
Parameter Best Estimate
--------- -------------
a1 20000.0161989552
a2 1.40052478989677
a3 0.000144580372702076
a4 1.4000274572155
a5 -41.1713406102593
a6 0.30000000000082
a7 0.79693554188002
While, if taking all parameter range as: [a1,a2,a3,a4,a5,a6,a7]>0
the result will be:
Sum Squared Error (SSE): 0.185432961663104
Root of Mean Square Error (RMSE): 0.0371998396785889
Correlation Coef. (R): 0.999912682569188
R-Square: 0.99982537276271
Parameter Best Estimate
--------- -------------
a1 1.52400447519627
a2 5.95100778385172
a3 1.86594097129705
a4 0.000205859180286249
a5 1.57073622281507
a6 12.6427943011546
a7 1.15807171273927

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!