Estimate Parameters for System of ODEs with given Data using a Genetic Algorithm (COVID-19-Model)

Question

Dylan on 28 Sep 2021

0
Link

Direct link to this question

https://www.mathworks.com/matlabcentral/answers/1462704-estimate-parameters-for-system-of-odes-with-given-data-using-a-genetic-algorithm-covid-19-model

Commented: Star Strider on 1 Oct 2021

Dear MATLAB experts! I am from Switzerland, developing a compartmental model for COVID-19 in school, and need some urgent help! I am definitely a beginner when it comes to coding but I managed to program my model. However, I would like to fit the twelve model parameters (here p(1) to p(12)) describing the transfer rates to some given official data in order to increase the model's accuracy. For this purpose, I have created a program that uses a genetic algorithm for the task. The problem is that I only have data for three of the seven states (Susceptible, Infected, Diagnosed, Ailing, Recovered, Dead, Total Population) and I don't quite get reasonable values - I suppose there are some major issues. Also, when setting the initial population (here from Italy) to 60e6, nothing happens (maybe the calculations are to difficult and take too long?). Can anyone update my code so that it works properly and gives me the suiting parameters between 0 and 1? I would be really, really grateful!! 🇨🇭 Here is my code:

function EIDAGG_MODELL_PARAMETER
clear 
close all
clc
%7 ODEs to describe the model (E,I,D,A,G1,G2,N)
    function M = EIDAGG(p,t)
        %Initial conditions with population of 60e6, 200 infected and 20 diagnosed
        %people
        M0 = [60e6;200;20;0;0;0;60e6];
        [~,m] = ode45(@DifEq,t,M0);
        function dM = DifEq(~,k)
            dMdt = zeros(7,1);
            dMdt(1) = -k(1).*(p(1).*k(2)+p(2).*k(3)+p(3).*k(4)+p(12))+p(11).*k(7);
            dMdt(2) = k(1).*(p(1).*k(2)+p(2).*k(3)+p(3).*k(4))-(p(4)+p(5)+p(6)+p(12)).*k(2);
            dMdt(3) = p(4).*k(2)-(p(7)+p(8)+p(12)).*k(3);
            dMdt(4) = p(6).*k(2)+p(8).*k(3)-(p(9)+p(10)+p(12)).*k(4);
            dMdt(5) = p(5).*k(2)+p(7).*k(3)+p(9).*k(4)-p(12).*k(5);
            dMdt(6) = p(10).*k(4);
            dMdt(7) = -p(12).*(k(1)+k(2)+k(3)+k(4)+k(5))+p(11).*k(7);
            dM = dMdt;
        end
        M = m;
    end
%From day 1 to day 20
t = [1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20];
%Official data for I, G1 and G2 for the first 20 days
k = [3 0 0
    19 0 1
    77 0 2
    129 1 2
    213 1 5
    311 1 10
    385 3 12
    588 45 17
    821 46 21
    1049 50 29
    1577 83 34
    1835 149 52
    2263 160 79
    2706 276 107
    3296 414 148
    3916 523 197
    5061 589 233
    6387 622 366
    7985 724 463
    8514 1004 631]; 
% [theta,Rsdnrm,Rsd,ExFlg,OptmInfo,Lmda,Jmat]=lsqcurvefit(@kinetics,theta0,t,c);
ftns = @(p) norm(k-EIDAGG2(p,t));
PopSz = 500;
Parameter = 12;
opts = optimoptions('ga','PopulationSize',PopSz,'InitialPopulationMatrix',randi(1,PopSz,Parameter)*1E-3,'MaxGenerations',2E3,'PlotFcn','gaplotbestf');
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n',t0)
% [B,fval,exitflag,output,population,scores] = ga(ftns, 53, [],[],[],[],zeros(53,1),Inf(53,1),[],[],opts)
%Parameters should have values between 0 and 1
[p,fval,exitflag,output] = ga(ftns,Parameter,[],[],[],[],zeros(Parameter,1),[1;1;1;1;1;1;1;1;1;1;1;1],[],[],opts);
t1 = clock;
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n',t1)
GA_Time = etime(t1,t0);
fprintf('\nElapsed Time: %23.15E\t\t%02d:%02d:%02d.%03d\n',GA_Time)
fprintf(1,'\tRate Constants:\n')
for k1 = 1:length(p)
    fprintf(1,'\t\tp(%d) = %8.5f\n',k1,p(k1))
end
tv = linspace(min(t),max(t));
Mfit = EIDAGG(p,tv);
figure(1)
plot(t,k,'p')
hold on
hlp = plot(tv,Mfit);
hold off
grid
xlabel('Zeit')
ylabel('Anzahl Personen')
legend(hlp,'Empfänglich','Infiziert','Diagnostiziert','Akut gefährdet','Genesen','Gestorben','Location','NE')
    function solpts = EIDAGG2(p,t)
        solpts = EIDAGG(p,t);
        solpts = solpts([2,5,6]);
    end
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Star Strider on 29 Sep 2021

0
Link

Direct link to this answer

https://www.mathworks.com/matlabcentral/answers/1462704-estimate-parameters-for-system-of-odes-with-given-data-using-a-genetic-algorithm-covid-19-model#answer_797299

You found the code I was intending to direct you to when I just now saw this post. (My apologies for not seeing it earlier. Thanks to Image Analyst for editing it, otherwise I would have missed it.)

The problem is that you are fitting 3 variables, however the code returns 7 variables. It will be necessary to select the 3 variables you want to fit, and then return only those variables. Do that in the ‘M’ variable, or in the ‘ftns’ function. (Doing it in ‘ftns’ would actually be easier, because you can then display and plot the entire output in figure(1)).

Also, estimate the initial conditions as parameters. That generally results in a much more successful fit.

I can probably edit your code to do all this, however I need to know what columns of the output you want to fit to the data. That is not currently obvious to me.

.

15 Comments
Show 13 older commentsHide 13 older comments

Star Strider on 29 Sep 2021

Edited: Star Strider on 30 Sep 2021

Open in MATLAB Online

My pleasure!

This runs correctly (although the differential equation integration occasionally fails), however it usually fails to converge on the correct results. It will probably be necessary to experiment with the initial population matrix and the parameter bounds to get it to produce usable results. (I checked to be certain that it only compares the correct columns of the objective function with the data, so that part works. That required a slightly different version of ‘ftns’ and a different call to it in the ga call.)

I have been running it for the last few minutes, repeating it to see if I could get appropriate parameter estimates. That has not been successful, so I defer to you. If you get good parameter estimates, please post them, and any changes in the code necessary to produce them.

EIDAGG_MODELL_PARAMETER
function EIDAGG_MODELL_PARAMETER
%7 ODEs to describe the model (E,I,D,A,G1,G2,N)
    function M = EIDAGG(p,t)
        %Initial conditions with population of 60e6, 200 infected and 20 diagnosed
        %people
%         M0 = [60e6;200;20;0;0;0;60e6];
        M0 = p(13:19);
        [~,m] = ode15s(@DifEq,t,M0);
        function dM = DifEq(~,k)
            dMdt = zeros(7,1);
            dMdt(1) = -k(1).*(p(1).*k(2)+p(2).*k(3)+p(3).*k(4)+p(12))+p(11).*k(7);
            dMdt(2) = k(1).*(p(1).*k(2)+p(2).*k(3)+p(3).*k(4))-(p(4)+p(5)+p(6)+p(12)).*k(2);
            dMdt(3) = p(4).*k(2)-(p(7)+p(8)+p(12)).*k(3);
            dMdt(4) = p(6).*k(2)+p(8).*k(3)-(p(9)+p(10)+p(12)).*k(4);
            dMdt(5) = p(5).*k(2)+p(7).*k(3)+p(9).*k(4)-p(12).*k(5);
            dMdt(6) = p(10).*k(4);
            dMdt(7) = -p(12).*(k(1)+k(2)+k(3)+k(4)+k(5))+p(11).*k(7);
            dM = dMdt;
        end
        M = m;
    end
%From day 1 to day 20
t = [1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20];
%Official data for I, G1 and G2 for the first 20 days
k = [3 0 0
    19 0 1
    77 0 2
    129 1 2
    213 1 5
    311 1 10
    385 3 12
    588 45 17
    821 46 21
    1049 50 29
    1577 83 34
    1835 149 52
    2263 160 79
    2706 276 107
    3296 414 148
    3916 523 197
    5061 589 233
    6387 622 366
    7985 724 463
    8514 1004 631]; 
% [theta,Rsdnrm,Rsd,ExFlg,OptmInfo,Lmda,Jmat]=lsqcurvefit(@kinetics,theta0,t,c);
function fv = ftns(p,t)
    EIDAGGout = EIDAGG(p,t);
    fv = norm(k-EIDAGGout(:,[3 5 6]));
end
% ftns = @(p) norm(k-EIDAGG2(p,t));
PopSz = 50;
% Parameter = 12;
Parameter = 19;
opts = optimoptions('ga','PopulationSize',PopSz,'InitialPopulationMatrix',randi(1E+4,PopSz,Parameter).*[ones(1,12)*1E-3 ones(1,7)*10],'MaxGenerations',2E3)%,'PlotFcn','gaplotbestf');
t0 = clock;
fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n',t0)
% [B,fval,exitflag,output,population,scores] = ga(ftns, 53, [],[],[],[],zeros(53,1),Inf(53,1),[],[],opts)
%Parameters should have values between 0 and 1
[p,fval,exitflag,output] = ga(@(p)ftns(p,t),Parameter,[],[],[],[],zeros(Parameter,1),[ones(12,1); Inf(7,1)],[],[],opts);
t1 = clock;
fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n',t1)
GA_Time = etime(t1,t0);
DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');
fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)
fprintf(1,'\n\tRate Constants:\n')
for k1 = 1:12
    fprintf(1,'\t\tp(%2d) = %11.8f\n',k1,p(k1))
end
fprintf(['\n\tInitial Conditions:\n'])
for k1 = 1:7
    fprintf('\t\tdMdt(%d) = %11.8f\n',k1,p(k1))
end
tv = linspace(min(t),max(t));
Mfit = EIDAGG(p,tv);
figure(1)
plot(t,k,'p')
hold on
hlp = plot(tv,Mfit);
hold off
grid
xlabel('Zeit')
ylabel('Anzahl Personen')
legend(hlp,'Empfänglich','Infiziert','Diagnostiziert','Akut gefährdet','Genesen','Gestorben','Location','best')
%     function solpts = EIDAGG2(p,t)
%         solpts = EIDAGG(p,t);
%         solpts = solpts([2,5,6]);
%     end
end

EDIT — (2021 09 30 at 00:42)

I was able to get a somewhat reasonable fit with these parameters:

	Rate Constants:
		p( 1) =  0.88394759
		p( 2) =  0.69125134
		p( 3) =  0.83771593
		p( 4) =  0.99999978
		p( 5) =  0.06200945
		p( 6) =  0.00183627
		p( 7) =  0.00015000
		p( 8) =  0.00132310
		p( 9) =  0.98997187
		p(10) =  0.15661815
		p(11) =  0.99958871
		p(12) =  0.78121629
    Initial Conditions:
		dMdt(1) =  0.88394759
		dMdt(2) =  0.69125134
		dMdt(3) =  0.83771593
		dMdt(4) =  0.99999978
		dMdt(5) =  0.06200945
		dMdt(6) =  0.00183627
		dMdt(7) =  0.00015000

producing this plot:

So at least some of the values managed to converge.

.

Star Strider on 1 Oct 2021

Open in MATLAB Online

The code only fits the three differential equations tha match the colums of ‘k’. It otherwise infers what the other differential equations should be, based on fitting the parameters in those differential equations. The problem of course is known in control theory as unobservable and uncontrollable states, meaning that the input (whatever it is) does not directly affect those states (unmodelled terms), and if also unobservable, that they are not directly visible in the output. This is a problem with the estimation technique used here, and there is no way to overcome its weaknesses.

The ‘InitialPopulationMatrix’ is simply the initial matrix parameter space that ga uses in its first iteration. It then modifies those in each subsequent iteration until it evolves an acceptable set of parameters. Thjer is nothing magic about it. I scale its columns here with a multiplier vector, however that is not absolutely necessary.

I could not get the original code (in your most recent Comment) to run.

This runs:

EIDAGG_MODELL_PARAMETER

opts =

ga options: Set properties: InitialPopulationMatrix: [50×19 double] MaxGenerations: 2000 PopulationSize: 50 Default properties: ConstraintTolerance: 1.0000e-03 CreationFcn: [] CrossoverFcn: [] CrossoverFraction: 0.8000 Display: 'final' EliteCount: '0.05*PopulationSize' FitnessLimit: -Inf FitnessScalingFcn: @fitscalingrank FunctionTolerance: 1.0000e-06 HybridFcn: [] InitialPopulationRange: [] InitialScoresMatrix: [] MaxStallGenerations: 50 MaxStallTime: Inf MaxTime: Inf MutationFcn: [] NonlinearConstraintAlgorithm: 'auglag' OutputFcn: [] PlotFcn: [] PopulationType: 'doubleVector' SelectionFcn: [] UseParallel: 0 UseVectorized: 0

Start Time: 2021-10-01 21:42:21.5647 Optimization terminated: average change in the fitness value less than options.FunctionTolerance.

p = 1×19

0.6577 0.0405 0.2318 0.0092 0.0351 0.0333 0.9997 0.5792 0.9939 0.0040 0.9676 0.9976 0.2568 0.0048 0.0133 0.0035 0.0003 0.0031 0.0130

fval = 0.0138

exitflag = 1

output = struct with fields:

problemtype: 'boundconstraints' rngstate: [1×1 struct] generations: 325 funccount: 15328 message: 'Optimization terminated: average change in the fitness value less than options.FunctionTolerance.' maxconstraint: 0 hybridflag: []

Stop Time: 2021-10-01 21:43:06.8504 Elapsed Time: 4.528567099999999E+01 00:00:45.285 Rate Constants: p( 1) = 0.65773970 p( 2) = 0.04050389 p( 3) = 0.23183625 p( 4) = 0.00921020 p( 5) = 0.03512869 p( 6) = 0.03332077 p( 7) = 0.99965108 p( 8) = 0.57916133 p( 9) = 0.99386372 p(10) = 0.00397920 p(11) = 0.96762136 p(12) = 0.99762472 Initial Conditions: dMdt(1) = 0.65773970 dMdt(2) = 0.04050389 dMdt(3) = 0.23183625 dMdt(4) = 0.00921020 dMdt(5) = 0.03512869 dMdt(6) = 0.03332077 dMdt(7) = 0.99965108

Warning: Ignoring extra legend entries.

function EIDAGG_MODELL_PARAMETER

function M = EIDAGG(p,t)

k0 = p(13:19);

% k0 = [0.9999967 0.000003 0.0000003 0 0 0 1]; %how can I include the initial conditions for compartments k(1:7) as a fraction of all people?

[~,m] = ode15s(@DifEq,t,k0);

function dk = DifEq(~,k)

dkdt = zeros(7,1); %shouldn't it be dkdt instead of dMdt? and why are all vectors empty, i.e. zero at the beginning?

dkdt(1) = -k(1).*(p(1).*k(2)+p(2).*k(3)+p(3).*k(4)+p(12))+p(11).*k(7);

dkdt(2) = k(1).*(p(1).*k(2)+p(2).*k(3)+p(3).*k(4))-(p(4)+p(5)+p(6)+p(12)).*k(2);

dkdt(3) = p(4).*k(2)-(p(7)+p(8)+p(12)).*k(3);

dkdt(4) = p(6).*k(2)+p(8).*k(3)-(p(9)+p(10)+p(12)).*k(4);

dkdt(5) = p(5).*k(2)+p(7).*k(3)+p(9).*k(4)-p(12).*k(5);

dkdt(6) = p(10).*k(4);

dkdt(7) = -p(12).*(k(1)+k(2)+k(3)+k(4)+k(5))+p(11).*k(7);

dk = dkdt;

end

M = m;

end

t = [1

4

8

12

16

20

24

28

32

36

40];

k = [3 0 0

129 1 2

588 45 17

1835 149 52

3916 523 197

8514 1004 631

17750 1966 1441

28710 4025 2978

46638 7024 5476

62013 10361 8165

75528 14620 11591]/60e6;

function fv = ftns(p,t)

EIDAGGout = EIDAGG(p,t);

fv = norm(k-EIDAGGout(:,[3 5 6]));

end

PopSz = 50;

Parameter = 19;

opts = optimoptions('ga','PopulationSize',PopSz,'InitialPopulationMatrix',randi(1E+4,PopSz,Parameter).*[ones(1,12)*1E-3 ones(1,7)*10],'MaxGenerations',2E3)%,'PlotFcn','gaplotbestf'); %how can I interpret the InitialPopulationMatrix?

t0 = clock;

fprintf('\nStart Time: %4d-%02d-%02d %02d:%02d:%07.4f\n',t0)

[p,fval,exitflag,output] = ga(@(p)ftns(p,t),Parameter,[],[],[],[],zeros(Parameter,1),[ones(12,1); Inf(7,1)],[],[],opts)

t1 = clock;

fprintf('\nStop Time: %4d-%02d-%02d %02d:%02d:%07.4f\n',t1)

GA_Time = etime(t1,t0);

DT_GA_Time = datetime([0 0 0 0 0 GA_Time], 'Format','HH:mm:ss.SSS');

fprintf('\nElapsed Time: %23.15E\t\t%s\n\n', GA_Time, DT_GA_Time)

fprintf(1,'\n\tRate Constants:\n')

for k1 = 1:12

fprintf(1,'\t\tp(%2d) = %11.8f\n',k1,p(k1))

end

fprintf('\n\tInitial Conditions:\n')

for k1 = 1:7

fprintf('\t\tdMdt(%d) = %11.8f\n',k1,p(k1)) %arent these the parameters again and not the compartments k(1:7) at t=0?

end

tv = linspace(min(t),max(t));

Mfit = EIDAGG(p,tv);

figure

plot(t,k,'p')

hold on

hlp = plot(tv,Mfit);

hold off

grid

xlabel('Zeit')

ylabel('Anteil der Gesamtbevölkerung')

legend(hlp,'Empfänglich','Infiziert','Diagnostiziert','Akut gefährdet','Genesen','Gestorben','Location','best')

end

and produces the listed parameters.

It fits the parameters it is given to fit, and guesses at the others. It has no other option.

.

Dylan on 1 Oct 2021

Thanks for your answer, I can see the difficulty now! Do you agree with me that the ODEs were wrong before? And is it impossible to set initial conditions for the compartments k(1:7)? And last but not least, does it make sense to set strict boundaries for the parameters and are the ones which are modeled and not guessed those which appear in the modeled ODEs? Thanks and sorry for the disturbance!

Star Strider on 1 Oct 2021

As always, my pleasure!

No worries!

Bounding the parameters is appropriate of the assumption is that in a compartmental model (that all SIR models are), the kinetic parameters must always be on the interval (0,1).

The initial conditions can be whatever you want them to be. The ga algorithm will evolve a suitable set (in my code, since I estimate them with the other parameters) that will produce appropriate results. It is possible to set them as static non-parameters, however that generally leads to an inferior fit, since they are not absolutely certain and so should be estimated parameters.

The problem is not that the unmodeled differential equations cannot be directly estimated. It is instead that the parameters in them that are not parameters of the estimated differential equations cannot be reliably estimated. That will require any optimisation routine to ‘guess’ what they should be, and that will rarely be in any way accurate, much less precise.

.

Sign in to comment.

Estimate Parameters for System of ODEs with given Data using a Genetic Algorithm (COVID-19-Model)

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

15 Comments
Show 13 older commentsHide 13 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Estimate Parameters for System of ODEs with given Data using a Genetic Algorithm (COVID-19-Model)

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

15 Comments Show 13 older commentsHide 13 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

15 Comments
Show 13 older commentsHide 13 older comments