MATLAB Answers

Argento
0

Implementing Gillespie's Algorithm.

Asked by Argento
on 19 Jun 2015
Latest activity Commented on by David Goodmanson on 17 Feb 2019
Hello everyone,
Being pretty new to Matlab, I've been struggling trying to implement Gillespie's Algorithm (1977). Truth be told, I am still somewhat confused by certain aspects of the algorithm itself (such as the calculation of the propensity function). I've been tying to stick pretty close to the methods outlined in his paper.
Below I posted my attempt at coding fig.6 from his (1977) paper, but I've been getting nowhere.
Could someone with some experience with the algorithm tell me what I'm doing wrong? Many thanks.
clear
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%Species and reaction channels %%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
M = 2;
N = 1;
%%%%%%%%%%%%%%
%%%STEP 0 %%%
%%%%%%%%%%%%%%
c1X = 5; %% ???? %%
c1 = 5; %% ???? %%
c2 = 0.005;
Y = 10;
X1 = Y;
t = 0; %%% Time %%%
n = 0; %%% Counter %%%
t_max = 100; %%% Max. number of iterations %%%
T = zeros(1, t_max + 1); %%% Time values array (for plotting purposes) %%%
XX = zeros(1, t_max + 1); %%% X values array %%%
Yc = zeros(1, t_max + 1); %%% Y values array (for plotting purposes) %%%
%%%%%%%%%%%%%%
%%%STEP 1 %%%
%%%%%%%%%%%%%%
while n < t_max
h1 = X1*Y;
h2 = Y*(Y-1)*0.5;
a = zeros(1, 2);
a1 = h1*c1;
a2 = h2*c2;
a(1, 1) = a1;
a(1, 2) = a2;
a0 = sum(a);
%%%%%%%%%%%%%%
%%%STEP 2 %%%
%%%%%%%%%%%%%%
r1 = rand();
r2 = rand();
tau = -log(r1)*(1/a0);
t = t + tau;
if r2*a0 <= sum(a);
Y = Y + h1;
Yc(1, n+1) = Y;
else
Y = Y - h2;
Yc(1, n+1) = Y;
end
%%%%%%%%%%%%%%
%%%STEP 3 %%%
%%%%%%%%%%%%%%
t = t + tau;
T(1, n+1) = t;
n = n + 1;
end
plot(T, Yc)

  3 Comments

Hi Argento
If I were you, I'd work up to reaction (29) in Gillespie's paper from reaction (22). This simpler reaction is irreversible isomerisation. Because it contains only one reaction species, you can play with the algorithm without having to worry so much about indexing your vectors. Here's my code for reaction (22). I'll work towards reaction (29) and post that code when it's ready.
Hope that helps.
Harley
%code
clear
%%Irreversible isomerisation reaction.
%This gives figure 3 of Gillespie's 1977 paper.
M = 1; % Number of reaction pathways
N = 1; % Number of molecular species considered
%%STEP 0
c = 0.5; % In general, a vector of length M of stochastic reaction constants.
X = 1000; % Initial number of molecules of species X. In general, a vector of length N of initial population numbers for each species.
Xplot = X*ones(M,1); % For plotting.
t = 0; % Time variable.
tplot = zeros(1); % For plotting.
n = 0; % Reaction counter.
n_max = 1000;
%%STEP 1
while n < n_max
h = X; % Number of molecular reactant combinations available in current state. In general, h is a vector of length M consisting of combinatorial functions of molecular population numbers X (which is of length N).
a = h*c; % a is the propensity of the reaction pathway in current state. In general, a vector of length M
a0 = sum(a); % a0 is total propensity that anything happens. This number emerges more out of mathematical necessity than physical intuition.
%%STEP 2
r = rand(2,1);
tau = -log(r(1))/a0;
% Since this reaction is irreversible isomerisation (only one event can
% occur) the value of mu is always 1, and so the next step is
% unnecessary in this case.
mu = sum(r(2)*a0 <= cumsum(a)); % Currently always 1 since 0<=r(2)<=1. Note 0<mu<=M.
%%STEP 3
t = t + tau;
% Adjust population levels based on reaction formula(s). Again, because
% only one reaction is occuring, this switch statement is unnecessary.
switch mu
case 1
X(mu) = X(mu) - 1;
end
n = n + 1;
% At this point, all the physics has been simulated, it only remains to
% record the new values of t and X to vectors to we can plot it later.
Xplot(mu,n+1) = X(mu);
tplot(n+1) = t;
end
stairs(tplot, Xplot)
equation_22.svg
Hi all,
Just received an email asking me if I ever completed the code I promised almost 4 years ago now. Yes, I did, but I forgot to get back to this forum with my results! Anyway, here you go...
Equation 29 example
Here's some code which simulates equation 29 from Gillespie's 1977 paper.
If you run the code below, you should end up with something looking like figure 6 (though not exactly the same of course since it's stochastic):
figure
hold on
[tplot, Yplot3000] = eqn29 ( 3000 );
stairs(tplot, Yplot3000);
[tplot, Yplot10] = eqn29 ( 10 );
stairs(tplot, Yplot10);
line ( [0, 5], [1000, 1000] );
legend( {'$Y_0=3000$','$Y_0=10$', 'deterministic steady state'}, 'interpreter', 'latex' );
hold off
xlabel('time (minutes)', 'interpreter', 'latex');
ylabel('number of $Y$ molecules', 'interpreter', 'latex');
function [tplot, Yplot] = eqn29 ( Y0 )
% function to simulate reaction dynamics of equation 29 from Gillespie's
% 1977 paper. The sigle argument to this function specifies the initial
% number of molecules of species Y.
%% Equation 29 reactions
%This gives figure 6 of Gillespie's 1977 paper.
M = 2; % Number of reaction pathways
N = 1; % Number of molecular species considered
%% STEP 0
% Initial number of molecules of species Y(1) and X(2). In general, a
% vector of length N of initial population numbers for each species.
%------------------------
% equation | rate
%------------------|-----
% Xbar + Y -> 2*Y | c1
% Y + Y -> Z | c2
%------------------------
Y = Y0;
Xbar = 10;
% In general, a vector of length M of stochastic reaction constants.
c = [5/Xbar,0.005]; % units: (expected number of) reactions per minute
Yplot = Y; % For plotting.
t = 0; % Time variable.
tplot = zeros(1); % For plotting.
t_stop = 5;
n = 0; % Reaction counter.
n_max = 1000;
%% STEP 1
while t < t_stop
% Number of molecular reactant combinations available in current state.
% In general, h is a vector of length M consisting of combinatorial
% functions of molecular population numbers X (which is of length N).
h = [Xbar*Y,Y*(Y-1)/2];
% a is the propensity of the reaction pathway in current state. In
% general, a vector of length M
a = h.*c;
% a0 is total propensity that anything happens. This number emerges
% more out of mathematical necessity than physical intuition.
a0 = sum(a);
%% STEP 2
r = rand(2,1);
tau = -log(r(1))/a0;
% Note 0<=mu<=M. This decides which reaction occurs. If mu=0, no
% reaction occurs. The switch statement does nothing in this case.
mu = sum(r(2)*a0 <= cumsum(a));
%% STEP 3
t = t + tau;
% Adjust population levels based on reaction formula(s).
switch mu
case 2
Y = Y + 1;
case 1
Y = Y - 2;
end
n = n + 1;
% At this point, all the physics has been simulated, it only remains to
% record the new values of t and X to vectors to we can plot it later.
Yplot(n+1,:) = Y;
tplot(n+1) = t;
end
end
An equivalent of this system can be modelled deterministically like this:
This determinstic ODE represents the system under the assumption that the numbers of reactant molecules are high enough that the reaction rates and number of molecules can be assumed to be smooth without loss of detail. Obviousely this assumption is always wrong in some sense (you can't actually have 30000000.2 molecules of something). Remember however that all models are wrong, but some are useful!
The solution to the deterministic ODE system should be the average of many Gillespie algorithm runs, and gives us a nice way to check our working. If the Gillespie algorithm we've coded up yields results which are on average in agreement with the deterministic ODEs, then we can be fairly sure that we haven't made any errors.
[t3000, YDetplot3000] = ode45 ( @eqn29Deterministic, [0, 5], 3000 );
[t10, YDetplot10] = ode45 ( @eqn29Deterministic, [0, 5], 10 );
figure;
hold on;
plot ( t3000, YDetplot3000 );
plot ( t10, YDetplot10 );
legend( {'$Y_0=3000$','$Y_0=10$'}, 'interpreter', 'latex' );
xlabel('time (minutes)', 'interpreter', 'latex');
ylabel('mean number of $Y$ molecules', 'interpreter', 'latex');
xlim ( [0, 5] );
function dy = eqn29Deterministic (t, y)
Xbar = 10;
c_1 = 5/Xbar;
c_2 = 0.005;
dy = c_1 * Xbar * y - c_2 *y^2;
end
I advise anyone starting to use the Gillespie algorithm to check their working using this method. Take, say 1000 runs using a Gillespie algorithm, and take an average. Then simulate the deterministic equivalent system, and check to see if your curves look similar. Use caution if you're dealing with limit cycle behaviour (such as the Lotka reactions below). In such cases, the stochastic nature of your system will likely cause the amplitude and phase to vary quite a bit in the stochastic case because small fluctiations in molecule numbers at any point in the cycle will result in slight devations in the rate at which your system works its way around the limit cycle, and these small fluctions will become compounded over time. Therefore oscillations from your stochastic and deterministic systems will probably not line up with each other after some time has passed. The stochastic system will gradually go out of phase with the deterministic system, but the amplitudes and frequencies of the limit cycle will, on average, be the same as for the deterministic system.
Lotka reactions
This next bit of code is along the same lines, but simulates the Lotka system defined in equation 38 from Gillespie's 1977 paper.
The code below simulates this system, and records the numbers of the species and .
Note the changes to the reaction rates vector "c", and the vector "h" which enumerates the number of combinations of molecular interactions which can bring about each reaction. There are three different reactions modelled here, so the switch condition now has three cases.
%clear
%% Lotka reactions
%This gives figure 6 of Gillespie's 1977 paper.
M = 3; % Number of reaction pathways
N = 2; % Number of molecular species considered
%% STEP 0
% Initial number of molecules of species X(1) and X(2). In general, a
% vector of length N of initial population numbers for each species.
%--------------------------
% equation | rate
%--------------------|-----
% Xbar + Y1 -> 2*Y1 | c1
% Y1 + Y2 -> 2*Y2 | c2
% Y2 -> Z | c3
%--------------------------
Y1 = 1000;
Y2 = 1000;
X = 10;
% In general, a vector of length M of stochastic reaction constants.
c = [10/X,0.01,10]; % units: (expected number of) reactions per minute
Y1plot = Y1; % For plotting.
Y2plot = Y2;
t = 0; % Time variable.
t_max = 30;
tplot = zeros(1); % For plotting.
n = 0; % Reaction counter.
n_max = 1000000;
%% STEP 1
while t < t_max
% Number of molecular reactant combinations available in current state.
% In general, h is a vector of length M consisting of combinatorial
% functions of molecular population numbers X (which is of length N).
h = [X*Y1,Y1*Y2,Y2];
% a is the propensity of the reaction pathway in current state. In
% general, a vector of length M
a = h.*c;
% a0 is total propensity that anything happens. This number emerges
% more out of mathematical necessity than physical intuition.
a0 = sum(a);
%% STEP 2
r = rand(2,1);
tau = -log(r(1))/a0;
% Note 0<=mu<=M. This decides which reaction occurs. If mu=0, no
% reaction occurs. The switch statement does nothing in this case.
mu = sum(r(2)*a0 <= cumsum(a));
%% STEP 3
t = t + tau;
% Adjust population levels based on reaction formula(s).
switch mu
case 3
Y1 = Y1 + 1;
case 2
Y1 = Y1 - 1;
Y2 = Y2 + 1;
case 1
Y2 = Y2 - 1;
end
n = n + 1;
% At this point, all the physics has been simulated, it only remains to
% record the new values of t and X to vectors to we can plot it later.
Y1plot(n+1,:) = Y1;
Y2plot(n+1,:) = Y2;
tplot(n+1) = t;
end
figure
hold on
stairs(tplot, Y1plot)
stairs(tplot, Y2plot)
legend({'$Y_1$','$Y_2$'}, 'interpreter','latex');
xlabel('time (minutes)', 'interpreter', 'latex');
ylabel('numbers of $Y_1$ and $Y_2$ molecules', 'interpreter', 'latex');
hold off
figure
plot(Y1plot,Y2plot);
xlabel('number of $Y_1$ molecules', 'interpreter', 'latex');
ylabel('number of $Y_2$ molecules', 'interpreter', 'latex');
Time sampling (to make things quicker)
One problem which you may encounter when trying to analyse results from the Gillespie algorithm is that of sampling many Gillespie runs at a particular moment (or moments) in time. Let's say you're interested in looking at the distribution of species Y at time minutes for the equation 29 example above. You can run lots of simulations which will tell you the number of Y species at that time, but your results will tell you the number of species for all time between time up to time minutes, and you'll have to sift through the time vector from each simulation run to extract the times you're interested in.
For example, one Gillespie run might give you:
meaning that
whereas another might give you:
meaning that
It's not efficient to ask MATLAB to store the results for all time and then go through your time vectors to obtain species numbers at the times you're actually interested in since you will end up spending alot of time storing numbers you end up just throwing away.
It is far more efficient to obtain results only at time-points you're actually interested in. Let's say you want to know the number of species Yat times . We want the algorithm to output the following information:
or for another Gillespie run, perhaps the following would be obtained:
I.e. we want to have the times already sampled for us, and not have to bother sifting through huge arrays of results to extract these samples.
To sample only times you're actually interested in, it's best to introduce this time sampling within the Gillespie algorithm itself. I do this as follows (look for the if statement below):
% in this code, we will sample the timescale at regular intervals, rather
% than obtaining information about every change in species numbers which
% occurs (where the time sampling will be irregular).
%%%%%%%%%%%%%%%%%%%%%%% in my "equation_29.m" code, the gillespie algorithm
%%%%%%%%%%%%%%%%%%%%%%% usually gives us around 41000 time points when we
%%%%%%%%%%%%%%%%%%%%%%% simulate from Y0=10, but we can ask for just 1000
%%%%%%%%%%%%%%%%%%%%%%% equally spaced timepoints from this set.
[tplot, Yplot3000] = eqn29 ( 3000, linspace(0, 5, 1000) );
figure (1);
hold on;
stairs(tplot, Yplot3000);
%%%%%%%%%%%%%%%%%%%%%%% in my "equation_29.m" code, the gillespie algorithm
%%%%%%%%%%%%%%%%%%%%%%% usually gives us around 31000 time points when we
%%%%%%%%%%%%%%%%%%%%%%% simulate from Y0=10, but we can ask for just 1000
%%%%%%%%%%%%%%%%%%%%%%% equally spaced timepoints from this set.
[tplot, Yplot10] = eqn29 ( 10, linspace(0, 5, 1000) );
stairs(tplot, Yplot10);
line ( [0, 5], [1000, 1000] );
legend( {'$Y0=3000$','$Y0=10$', 'deterministic steady state'}, 'interpreter', 'latex' );
hold off
xlim([0,5]);
xlabel('time', 'interpreter', 'latex');
ylabel('number of Y molecules', 'interpreter', 'latex');
title ( 'Gillespie algorithm with time sampling' );
%%%%%%%%%%%%%%%%%%%%%% If we ask for samples at a rate which is higher than
%%%%%%%%%%%%%%%%%%%%%% the actual rate at which species numbers change, the
%%%%%%%%%%%%%%%%%%%%%% algorithm still works correctly.
%%%%%%%%%%%%%%%%%%%%%%% in my "equation_29.m" code, the gillespie algorithm
%%%%%%%%%%%%%%%%%%%%%%% usually gives us around 41000 time points when we
%%%%%%%%%%%%%%%%%%%%%%% simulate from Y0=10. Let's oversample (ask for more
%%%%%%%%%%%%%%%%%%%%%%% than 41000 timepoints), and see if this causes
%%%%%%%%%%%%%%%%%%%%%%% problems... We can ask for 1 million time samples
[tplot, Yplot3000] = eqn29 ( 3000, linspace(0, 5, 1e6) );
figure (2);
hold on;
stairs(tplot, Yplot3000);
%%%%%%%%%%%%%%%%%%%%%%% Same as above
[tplot, Yplot10] = eqn29 ( 10, linspace(0, 5, 1e6) );
stairs(tplot, Yplot10);
line ( [0, 5], [1000, 1000] );
legend( {'$Y0=3000$','$Y0=10$', 'deterministic steady state'}, 'interpreter', 'latex' );
hold off
xlim([0,5]);
xlabel('time', 'interpreter', 'latex');
ylabel('number of Y molecules', 'interpreter', 'latex');
title ( 'Gillespie algorithm with time sampling (oversampling)' );
%%%%%%%%%%%%%%%%%%%%%% If we ask for samples at a rate which is lower than
%%%%%%%%%%%%%%%%%%%%%% the actual rate at which species numbers change, the
%%%%%%%%%%%%%%%%%%%%%% algorithm still works correctly.
%%%%%%%%%%%%%%%%%%%%%%% in my "equation_29.m" code, the gillespie algorithm
%%%%%%%%%%%%%%%%%%%%%%% usually gives us around 41000 time points when we
%%%%%%%%%%%%%%%%%%%%%%% simulate from Y0=10. Let's undersample (ask for
%%%%%%%%%%%%%%%%%%%%%%% fewer than 41000 timepoints), and see if this causes
%%%%%%%%%%%%%%%%%%%%%%% problems... We can ask for 10 time samples
[tplot, Yplot3000] = eqn29 ( 3000, linspace(0, 5, 10) );
figure (3);
hold on;
stairs(tplot, Yplot3000);
%%%%%%%%%%%%%%%%%%%%%%% Same as above
[tplot, Yplot10] = eqn29 ( 10, linspace(0, 5, 10) );
stairs(tplot, Yplot10);
line ( [0, 5], [1000, 1000] );
legend( {'$Y0=3000$','$Y0=10$', 'deterministic steady state'}, 'interpreter', 'latex' );
hold off
xlim([0,5]);
xlabel('time', 'interpreter', 'latex');
ylabel('number of Y molecules', 'interpreter', 'latex');
title ( 'Gillespie algorithm with time sampling (undersampling)' );
%%%%%%%%%%%%%%%%%%%%%% if you want to make a histogram of the distribution
%%%%%%%%%%%%%%%%%%%%%% of species numbers at a particular time, you can
%%%%%%%%%%%%%%%%%%%%%% just run this algorithm multiple times (I advise
%%%%%%%%%%%%%%%%%%%%%% using parfor to run this algorithm in parallel on
%%%%%%%%%%%%%%%%%%%%%% lots of CPU cores since the computations will each
%%%%%%%%%%%%%%%%%%%%%% be independent). I won't write parfor here since I
%%%%%%%%%%%%%%%%%%%%%% don't know what kind of computer you're using. You
%%%%%%%%%%%%%%%%%%%%%% can read about it here: https://uk.mathworks.com/help/distcomp/parfor.html
number_of_Gillespie_runs = 100;
Y_samples = zeros( 1, number_of_Gillespie_runs);
sample_time = 2;
for i=1:length(Y_samples)
[~, Y_samples(i)] = eqn29 ( 10, sample_time );
end
figure ( 4 );
hist ( Y_samples );
xlabel ( ['number of species $Y$ at time $t=', num2str(sample_time),'$'], 'interpreter', 'latex' );
ylabel ( 'frequency', 'interpreter', 'latex' );
title ( [num2str(number_of_Gillespie_runs), ' samples'], 'interpreter', 'latex' );
function [tplot, Yplot] = eqn29 ( Y0, timeSamples )
% function to simulate reaction dynamics of equation 29 from Gillespie's
% 1977 paper. The sigle argument to this function specifies the initial
% number of molecules of species Y.
% parameter | meaning
%-------------|-------------------------------------------------------------------------
% Y0 | number of species Y0 at time t=0
% timeSamples | vector of times at which we would like to know the nuber of each sepcies
%---------------------------------------------------------------------------------------
%% Equation 29 reactions
%This gives figure 6 of Gillespie's 1977 paper.
% M = 2; % Number of reaction pathways
% N = 1; % Number of molecular species considered
%% STEP 0
% Initial number of molecules of species Y(1) and X(2). In general, a
% vector of length N of initial population numbers for each species.
%------------------------
% equation | rate
%------------------|-----
% Xbar + Y -> 2*Y | c1
% Y + Y -> 2*Y | c2
%------------------------
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%% Harley's time sampling code...%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
obs_index = 1; % output index
n_times_to_observe = length ( timeSamples ); % the number of times which we will be sampling
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Y = Y0;
Xbar = 10;
% In general, a vector of length M of stochastic reaction constants.
c = [5/Xbar,0.005];
Yplot = zeros ( n_times_to_observe, length( Y ) ); % Prealocate the output array (makes it easier for the computer if we claim all of the memory we need at the beginning rather than dynamically allocating memory on the fly).
t = 0; % Time variable.
tplot = zeros(1); % For plotting.
%% STEP 1
while obs_index<=n_times_to_observe % note this used to be "t < t_stop", but we now look at the time index for our stopping condition
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%% Harley's time sampling code...%%%%%
%%%%%%%%%%%%%%%%% this if statement catches instances where the time t
%%%%%%%%%%%%%%%%% passes one of our sampling times, and records the species
%%%%%%%%%%%%%%%%% numbers when this happens. We can now sample time at any
%%%%%%%%%%%%%%%%% resolution we like, and only record the times we're
%%%%%%%%%%%%%%%%% interested in.
if(t>=timeSamples(obs_index)) % if we've exceeded an obsevation time, record the current population values up to the new time change
Yplot(obs_index,:) = Y;
tplot(obs_index) = t;
obs_index = obs_index + 1; % move our attention to next observation time
continue; % restart while loop
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Number of molecular reactant combinations available in current state.
% In general, h is a vector of length M consisting of combinatorial
% functions of molecular population numbers X (which is of length N).
h = [Xbar*Y,Y*(Y-1)/2];
% a is the propensity of the reaction pathway in current state. In
% general, a vector of length M
a = h.*c;
% a0 is total propensity that anything happens. This number emerges
% more out of mathematical necessity than physical intuition.
a0 = sum(a);
%% STEP 2
r = rand(2,1);
tau = -log(r(1))/a0;
% Note 0<=mu<=M. This decides which reaction occurs. If mu=0, no
% reaction occurs. The switch statement does nothing in this case.
mu = sum(r(2)*a0 <= cumsum(a));
%% STEP 3
t = t + tau;
% Adjust population levels based on reaction formula(s).
switch mu
case 2
Y = Y + 1;
case 1
Y = Y - 2;
end
end
end