Choosing Initial values for non-linear fitting and running Genetic Algorithm

Question

Akaash Srikanth on 25 Jul 2023

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/2000358-choosing-initial-values-for-non-linear-fitting-and-running-genetic-algorithm

Commented: Alex Sha on 27 Jul 2023

Note: This is a question based on suggestions/answers based on another question I had asked: link, but now a slightly different one.

Here is a summary: I am running an optics to calibrate an optical device. For this I need to fit a curve between two variables, voltage and phase. There are 7 different parameters

F= @(a,V) a(1).*(a(2) - a(3) - (a(2).*a(4))./(a(2).^2.*cos(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2 + a(4).^2.*sin(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2).^(1/2))

This is the best fit I could arrive at but however would prefer something that fits even better. [Atleast from x axis starting from 0.8 Volts]

I had some struggles to find the starting point of the parameters for such problems. It was suggested that I use either a global optimization algorithm like (Genetic Algorithm etc) to get some good starting values. However, I when I run the genetic algorithm, the program keeps running and doesn't stop at all. (I ran it for over 15 hours). Any suggestions on where things could have gone wrong? Also, is there any way I can keep better improving the parameter values that the algorithm predicts on loop?

Here are the constrains on the parameter

a(1): (about 20000, 80000) - This is the ratio of the thickness of a crystal and the wavelength of the laser we are using.
a(2) and a(4): (about 1.4-1.8) - Both these are refractive indices.
No strict range on a(3)
a(6) : Around (0.3-0.9) - These are some voltages in V.
a(7): Around (0.3-2)- Again some voltage in V.
a(5)- No strict range on this too

I am running a huge code but here is a part of the script which uses the GA or Multi-Start algorithm:

%Some nonsense that You might not need
function [volt,inten] = IntVoltFit(volt,inten,Phase)
    syms G B net n n_e d z V1 Vo Vc b A n_c n_del p k
    inten=inten/max(inten);
    inten=inten-min(inten);
    theta=b-(2*atan(exp(-((V1-Vc)/Vo))))
    net=(n_e*n)/(sqrt((n*cos(theta))^2+(n_e*sin(theta))^2));    
    T = 1-(cos(d*(n-net-n_del)));
    delta=d*(n-net-n_del+n_c)
    sympref('PolynomialDisplayStyle','ascend');
    latex(taylor(delta,V1,'ExpansionPoint',2,'Order',1))
    b=[-26.05,20,-1.456,0.179,1.594,2.795];
    Fit=@(b,V) b(1)+(b(2).*atan(b(3).*V+b(4).*V.^3+b(5))+b(6).*V);
    latex(taylor(Fit(b,V1),V1,'ExpansionPoint',2,'Order',2))
%PROBABLY the relevant part for this question
    V=volt
    F= @(a,V) a(1).*(a(2)  - a(3) - (a(2).*a(4))./(a(2).^2.*cos(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2 + a(4).^2.*sin(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2).^(1/2))
    ftns= @(a_fit) norm(F(a_fit,V)-Phase)
    [a_fit]=gamultiobj(ftns,7, [],[],[],[],[-Inf;-Inf;-Inf;-Inf;-Inf;-Inf;-Inf],[Inf;Inf;Inf;Inf;Inf;Inf;Inf]);
    fprintf(1,'\tRate Constants:\n')
    for k1 = 1:numel(a_fit)
        fprintf(1, '\t\ta(%2d) = %8.5f\n', k1, a_fit(k1))
    end
    figure
    plot(V, Phase, '.b')
    hold on
    plot(V, F(a_fit,V), '-r')
    hold off
    grid  
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Star Strider on 26 Jul 2023

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/2000358-choosing-initial-values-for-non-linear-fitting-and-running-genetic-algorithm#answer_1279032

Open in MATLAB Online

V = readmatrix('volt.csv');

Phase = readmatrix('phase.csv');

F= @(a,V) a(1).*(a(2) - a(3) - (a(2).*a(4))./(a(2).^2.*cos(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2 + a(4).^2.*sin(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2).^(1/2));

ftns = @(a) norm(Phase - F(a,V))

ftns = function_handle with value:

@(a)norm(Phase-F(a,V))

PopSz = 100;

Parms = 7;

optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10); % Options Structure For 'Answers' Problems

t0 = clock;

fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)

Start Time: 2023-07-26 12:18:52.5968

[theta,fval,exitflag,output,population,scores] = ga(ftns, Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);

Optimization terminated: maximum number of generations exceeded.

t1 = clock;

fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)

Stop Time: 2023-07-26 12:19:04.6864

GA_Time = etime(t1,t0)

GA_Time = 12.0896

DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');

fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)

Elapsed Time: 1.208955100000000E+01 00:00:12.089

fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)

Fitness value at convergence = 1.1617 Generations = 5000

fprintf(1,'\tRate Constants:\n')

Rate Constants:

for k1 = 1:length(theta)

fprintf(1, '\t\tTheta(%2d) = %8.5f\n', k1, theta(k1))

end

Theta( 1) = 1.37342 Theta( 2) = 7.55204 Theta( 3) = 2.10657 Theta( 4) = 0.69253 Theta( 5) = 6.88245 Theta( 6) = 3.57480 Theta( 7) = 4.58644

figure

plot(V, Phase, '.b')

hold on

plot(V, F(theta,V), '-r')

hold off

grid

xlabel('V')

ylabel('Phase')

title('GA Parameters')

Fmdl = fitnlm(V, Phase, F, theta)

Warning: The Jacobian at the solution is ill-conditioned, and some model parameters may not be estimated well (they are not identifiable). Use caution in making predictions.

Fmdl =

Nonlinear regression model: y ~ F(a,V) Estimated Coefficients: Estimate SE tStat pValue ________ _______ _________ __________ a1 2.5732 6214.1 0.0004141 0.99967 a2 6.6979 16175 0.0004141 0.99967 a3 1.1205 2705.8 0.0004141 0.99967 a4 3.0875 7455.8 0.0004141 0.99967 a5 6.0226 0.35322 17.051 1.2351e-34 a6 0.25671 0.32998 0.77796 0.43804 a7 1.3358 0.60774 2.198 0.029762 Number of observations: 134, Error degrees of freedom: 127 Root Mean Squared Error: 0.0556 R-Squared: 1, Adjusted R-Squared 1 F-statistic vs. constant model: 5.73e+04, p-value = 2.24e-215

theta = Fmdl.Coefficients.Estimate;

figure

plot(V, Phase, '.b')

hold on

plot(V, F(theta,V), '-r')

hold off

grid

xlabel('V')

ylabel('Phase')

title('‘fitnlm’ Using GA Parameters as Initial Estimates')

Although the model fits well, it appears to not be appropriate to the data. Specifically, ‘a1’ through ‘a4’ do not appear to be required for the model (the p values are near unity), and only ‘a5’ and ‘a7’ are significantly different from zero.

.

4 Comments
Show 2 older commentsHide 2 older comments

Akaash Srikanth on 26 Jul 2023

Edited: Akaash Srikanth on 26 Jul 2023

Open in MATLAB Online

Wow! This looks good. I am for some reason unable to even reproduce your same results. Before that a couple of points:

Is it possible that the expression I had given is for some reason wrong?

a = sym('a',[1 7]);

syms V

theta=a(5)-(2*atan(exp(-((V-a(6))/a(7)))));

net=(a(4)*a(2))/(sqrt((a(2)*cos(theta))^2+(a(4)*sin(theta))^2));

delta=a(1)*(a(2)-net-a(3))

delta =

This is the expression that I am supposed to obtain for the function 'F'. But since this is not in the form that is preferred by function handle should I give it as:

Fun=matlabFunction(delta)
Fun = function_handle with value:
    @(V,a1,a2,a3,a4,a5,a6,a7)-a1.*(-a2+a3+a2.*a4.*1.0./sqrt(a2.^2.*cos(a5-atan(exp(-(V-a6)./a7)).*2.0).^2+a4.^2.*sin(a5-atan(exp(-(V-a6)./a7)).*2.0).^2))

instead of the way I had mentioned it in the question? In other words is the function that I gave in the question and the above like equivalent?

2. Now coming to the fitting part, I am still unable to get the results that you had got.

    F=@(V,a) a(1).*(a(2) - a(3) - (a(2).*a(4))./(a(2).^2.*cos(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2 + a(4).^2.*sin(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2).^(1/2));
    V=volt(20:143);
    Phase=Phase(20:143)
    PopSz = 100;
    Parms = 7;
    optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10);
    ftns= @(a) norm(F(V,a)-Phase)
    t0 = clock;
    fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)
    [a_fit,fval,exitflag,output,population,scores]=ga(ftns,Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);
    t1 = clock;
    fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)
    GA_Time = etime(t1,t0)
    DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
    fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
    fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)
    fprintf(1,'\tRate Constants:\n')
    for k1 = 1:numel(a_fit)
        fprintf(1, '\t\ta(%2d) = %8.5f\n', k1, a_fit(k1))
    end
    figure
    plot(V, Phase, '.b')
    hold on
    plot(V, F(V,a_fit), '-r')
    hold off
    grid  
    xlabel('V')
    ylabel('Phase')
    title('GA Parameters') 

gives me the following outputs:

Note: I tried both the functions given in the question and the one that I got a few lines above using the function matlabFunction and yet both give me super weird results. Can you identify where the problem is?

Thanks

Edited:

So in addition, I somehow tried to find the optimum values using curve fitting as shown below:

but then I use those values to plot using the function handle as described in the question for example, then the fits don't even match at all as shown (Red one is the fit parameters obtained froim curve fitting but then plugged into the function handle):

Can you also explain this to me?

Star Strider on 26 Jul 2023

Edited: Star Strider on 26 Jul 2023

Open in MATLAB Online

I did not use any part of the symbolic code. I did not even run it.

I just copied ‘F’ and integrated it into code I use generally with the ga function. There are likely no significant differences between R2022b and R2023a with respect to ga, so just run the code a few times (since the 'InitialPopulationMatrix' is defined by the randi function), and it should produce a reasonably decent fit. At least one of the runs should essentially reproduce the resutls I got.

EDIT — (26 Jul 2023 at 16:15)

I am not certain what the problem is. Running the code you posted with the .csv files you posted produces this result in R2023a here —

F=@(V,a) a(1).*(a(2) - a(3) - (a(2).*a(4))./(a(2).^2.*cos(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2 + a(4).^2.*sin(a(5) - 2.*atan(exp(-(V - a(6))./a(7)))).^2).^(1/2));

V = readmatrix('volt.csv');

Phase = readmatrix('phase.csv');

PopSz = 100;

Parms = 7;

optsAns = optimoptions('ga', 'PopulationSize',PopSz, 'InitialPopulationMatrix',randi(1E+4,PopSz,Parms)*1E-3, 'MaxGenerations',5E3, 'FunctionTolerance',1E-10);

ftns= @(a) norm(F(V,a)-Phase)

ftns = function_handle with value:

@(a)norm(F(V,a)-Phase)

t0 = clock;

fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t0)

Start Time: 2023-07-26 16:09:22.8362

[a_fit,fval,exitflag,output,population,scores]=ga(ftns,Parms, [],[],[],[],zeros(Parms,1),Inf(Parms,1),[],[],optsAns);

Optimization terminated: maximum number of generations exceeded.

t1 = clock;

fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n', t1)

Stop Time: 2023-07-26 16:09:33.4106

GA_Time = etime(t1,t0)

GA_Time = 10.5743

DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');

fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)

Elapsed Time: 1.057434700000000E+01 00:00:10.574

fprintf('Fitness value at convergence = %.4f\nGenerations \t\t\t\t = %d\n\n',fval,output.generations)

Fitness value at convergence = 1.0441 Generations = 5000

fprintf(1,'\tRate Constants:\n')

Rate Constants:

for k1 = 1:numel(a_fit)

fprintf(1, '\t\ta(%2d) = %8.5f\n', k1, a_fit(k1))

end

a( 1) = 1.93083 a( 2) = 5.62535 a( 3) = 1.49782 a( 4) = 0.78787 a( 5) = 0.21854 a( 6) = 1.44228 a( 7) = 3.63094

figure

plot(V, Phase, '.b')

hold on

plot(V, F(V,a_fit), '-r')

hold off

grid

xlabel('V')

ylabel('Phase')

title('GA Parameters')

This seems to be an even better fit than the one I got previously, visually as well as by the final fitness value.

.

Akaash Srikanth on 26 Jul 2023

@Star Strider

At this point I have tried it over 20 times and I still keep getting similar results as to what I had gotten. Fits no where near what you had obtained. Is this normal? Should I keep storing the values in the best fit in some vector and repeat it?

Also, there is a small edit in the previous comment. Any help regarding that should also be helpful.

Thanks,

Akaash

Alex Sha on 27 Jul 2023

Open in MATLAB Online

If taking parameter ranges as:

a(1): (about 20000, 80000)

a(2) and a(4): (about 1.4-1.8)

No strict range on a(3)

a(6) : Around (0.3-0.9)

a(7): Around (0.3-2)

a(5)- No strict range on this too

The best result will be as below. Note the parameter values of a1, a2, a4 and a6 are all in the lower bound of ranges, it means the lower bound values for those parameters are not appropriate or need to be adjusted.

Sum Squared Error (SSE): 0.913188313696523
Root of Mean Square Error (RMSE): 0.082552033057426
Correlation Coef. (R): 0.999569920444164
R-Square: 0.999140025856752
Parameter	Best Estimate       
---------	-------------       
a1       	20000.0161989552    
a2       	1.40052478989677    
a3       	0.000144580372702076
a4       	1.4000274572155     
a5       	-41.1713406102593   
a6       	0.30000000000082    
a7       	0.79693554188002  

While, if taking all parameter range as: [a1,a2,a3,a4,a5,a6,a7]>0

the result will be:

Sum Squared Error (SSE): 0.185432961663104
Root of Mean Square Error (RMSE): 0.0371998396785889
Correlation Coef. (R): 0.999912682569188
R-Square: 0.99982537276271
Parameter	Best Estimate       
---------	-------------       
a1       	1.52400447519627    
a2       	5.95100778385172    
a3       	1.86594097129705    
a4       	0.000205859180286249
a5       	1.57073622281507    
a6       	12.6427943011546    
a7       	1.15807171273927

Sign in to comment.

Choosing Initial values for non-linear fitting and running Genetic Algorithm

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments
Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Choosing Initial values for non-linear fitting and running Genetic Algorithm

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

4 Comments Show 2 older commentsHide 2 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

4 Comments
Show 2 older commentsHide 2 older comments