Code covered by the BSD License  

Highlights from
Differential Evolution Monte Carlo sampling

image thumbnail
from Differential Evolution Monte Carlo sampling by Corey Yanofsky
easy Bayesian computation for real parameter spaces

demo_DEMC
function demo_DEMC

% This work was supported by grants from Genome Quebec and the National 
% Science and Engineering Research Council of Canada.


disp(' ')
disp('1 ------------------------------------')
disp('This demo demonstrates the use of differential evolution')
disp('Markov chain Monte Carlo (DE-MC). The code implements algorithms')
disp('described in:')
disp(' ')
disp('Cajo F.T. Ter Braak, "A Markov Chain Monte Carlo version of the') 
disp('genetic algorithm Differential Evolution: easy Bayesian computing')
disp('for real parameter spaces", Stat Comput (2006) 16:239249')
disp(' ')
disp('The example we will analyze is a non-linear regression with Gaussian')
disp('errors. The model function is a logistic curve.')
disp(' ')
disp('Press any key to see the function.')

xaxis = 0:0.01:1;
true_param = [0.5 0 10 50];
modelfunc = @(x,param) param(2) - param(3) + (2.*(param(3))./(1 + exp(2.*param(4).*(param(1) - x)./(param(3)))));
f = modelfunc(xaxis,true_param);

pause;
h = figure;
temp = get(h,'Position');
temp(1) = 5;
set(h,'Position',temp);
plot(xaxis,f,0.5,0,'ro');
ylim([-15 15]);

disp(' ')
disp('2 ------------------------------------')
disp('This function has four parameters. The first two are the x and y') 
disp('co-ordinates of the central point, which has been plotted in red.')
disp('The co-ordinates are (0.5, 0). The third parameter is the vertical') 
disp('distance from the central point to the asymptotes. In this case,')
disp('the distance is 10. The fourth parameter is the slope of the curve')
disp('at the central point; in this case, the slope is 50.')
disp(' ')
disp('Press any key to continue.')
pause;

disp(' ')
disp('3 ------------------------------------')
disp('We generate data as follows:')
disp(' ')
disp('x = rand(100,1);');
disp('y = modelfunc(x, [0.5, 0, 10, 50]) + randn(100,1);')

x = rand(100,1);
y = modelfunc(x,true_param) + randn(100,1);
figure(h);
plot(xaxis,f,x,y,'.');

disp(' ')
disp('Press any key to continue.')
pause;

disp(' ')
disp('4 ------------------------------------')
disp('The presence of Gaussian noise introduces a fifth parameter --')
disp('the noise standard deviation. We will carry out our analysis')
disp('using the log of the standard deviation because DE-MC requires')
disp('that the target distribution be supported over the entire real')
disp('number line. In our case the log of the standard deviation is 0.')
disp(' ');
disp('The likelihood function on its own is rather poorly-conditioned:')
disp('in any bounded region of data space, the model includes (very close') 
disp('to) constant functions and (very close to) linear functions as') 
disp('degenerate special cases for various settings of the parameters.')
disp('We can therefore expect the likelihood function to be nearly flat')
disp('in some regions of parameter space. To control this behavior, we')
disp('could use a prior distribution on the parameters, but this regression')
disp('is artificial, so there is no basis for a prior distribution. We')
disp('will make the posterior distribution proportional to the likelihood')
disp('function, and use a different strategy to control its ill-conditioning.')
disp(' ');
disp('logpos = @(param) sum(log(normpdf(y, modelfunc(x,param(1:4)),exp(param(5)))));');

logpos = @(param) sum(log(normpdf(y, modelfunc(x,param(1:4)),exp(param(5)))));

disp(' ')
disp('Press any key to continue.')
pause;

disp(' ')
disp('5 ------------------------------------')
disp('We need some initial states in order to commence the MCMC sampling.')
disp('We start by finding the maximum a posteriori (MAP) parameter estimates.')
disp('We''re going to cheat a little here -- the initial values we give to the ')
disp('optimization routine are just the true parameter values, while in')
disp('real data analysis, you''re going to have to exert yourself to find')
disp('good starting values.')
disp(' ')
disp('Press any key to run the optimization routine.')
pause
disp(' ')
disp('6 ------------------------------------')
disp('MAP_est = fminsearch(@(param) -logpos(param),[true_param 0])')
MAP_est = fminsearch(@(param) -logpos(param),[true_param 0])
disp(' ')
disp('We plot the MAP estimate in the data space, just to see how it compares')
disp('to the true function. The true function is plotted in blue, and true')
disp('2-sigma error bounds are plotted in dashed blue. The MAP-estimated')
disp('function is plotted in solid green; estimated 2-sigma error bounds are')
disp('plotted in dashed red.')

fMAP = modelfunc(xaxis,MAP_est(1:4));
s = 2.*exp(MAP_est(5))*ones(size(xaxis));
figure(h);
plot(x,y,'.',xaxis,f,'b',xaxis,f+2,'b--',xaxis,f-2,'b--',...
    xaxis,fMAP,'g',xaxis,fMAP+s,'r--',xaxis,fMAP-s,'r--');
ylim([-15 15]);
title('MAP estimate');

disp(' ')
disp('Press any key to continue.')
pause

disp(' ')
disp('7 ------------------------------------')
disp('The key to controlling the ill-conditioning of the posterior is to choose')
disp('initial states that have a covariance structure similar to that of the')
disp('posterior distribution in the neighborhood of the MAP point. This ensures')
disp('from the start that the difference vectors cannot displace states into') 
disp('regions of flat posterior density, and is equivalent to incorporating prior')
disp('information ruling out near-linear or near-constant functions. We''ll use') 
disp('the negative inverse of the hessian of the posterior at the MAP point as') 
disp('our estimate of the covariance matrix. To obtain over-dispersed starting') 
disp('points, we''ll multiply the estimated covariance matrix by 16, quadrupling') 
disp('the length of the principle axes defined by the matrix. We''ll take as our') 
disp('initial states twenty multivariate-normal random deviates centered around')
disp('the MAP points with the desired covariance structure.')
disp(' ');
disp('covariance_matrix = -16.*inv(hessian(logpos,MAP_est));');
disp('init_state = repmat(MAP_est,20,1) + ((chol(covariance_matrix)'')*randn(5,20))'';');

covariance_matrix = -16.*inv(internal_hessian(logpos,MAP_est));
init_state = repmat(MAP_est,20,1) + ((chol(covariance_matrix)')*randn(5,20))';

% Chains run from the initial states defined below are also interesting to
% watch. Comment out the above assignments and de-comment the ones below,
% and then run the demo (ignore the descriptive text in the demo, as it 
% refers to the above assignments).
%
% covariance_matrix = -inv(hessian(logpos,MAP_est));
% init_state = repmat(MAP_est,20,1) + ((chol(covariance_matrix)')*randn(5,20))';
% init_state(:,4) = init_state(:,4) + 30;


disp(' ')
disp('Press any key to continue.')
pause

disp(' ')
disp('8 ------------------------------------')
disp('The first demonstration will use DE-MC without blocks -- that is, ')
disp('proposal states will be generated by taking differences of state ')
disp('vectors in the full parameter space.  We''ll then run the sampler for')
disp('200 iterations, obtaining 4,000 samples (200 from each of the twenty')
disp('chains).')
disp(' ')
disp('Press any key to begin sampling.')
pause

disp(' ')
disp('9 ------------------------------------')
disp('tic;')
disp('full_DEMC_state = DEMC_sample(logpos, 200, init_state);')
disp('time_full_DEMC = toc')
disp(' ');

tic;
full_DEMC_state = DEMC_sample(logpos, 200, init_state);
time_full_DEMC = toc

disp(' ')
disp('Press any key to continue.')
pause


disp(' ')
disp('10 ------------------------------------')
disp('Let''s look at the chains for the slope parameter.')
disp(' ')
disp('plot(1:200,squeeze(full_DEMC_state(:,4,:)));')
disp(' ')

figure(h);
plot(1:200,squeeze(full_DEMC_state(:,4,:)));
title('parameter 4 (slope at central point)');

disp('By starting the chains from over-dispersed states, we can observe')
disp('convergence to the stationary distribution; in typical simulations,')
disp('convergence was reached by iteration 100 at the latest.')

temp = squeeze(full_DEMC_state(:,5,:)); 
[idx1 idx2] = find(temp == max(temp(:)));
idx = idx1(1);

disp(' ')
disp('Press any key to continue.')
pause

disp(' ')
disp('11 -----------------------------------')
disp('Just for fun, we can watch a chain converge in the data space. We''ll')
disp('pick the chain with the largest sampled log-standard-deviation. In the')
disp(['present case, this is chain #' num2str(idx) '. We plot the initial state of'] )
disp('that chain in the data space. The dashed red curves give 2-sigma bounds.');
disp(' ')

f = modelfunc(xaxis,init_state(idx,1:4));
s = 2.*exp(init_state(idx,5))*ones(size(xaxis));
figure(h);
plot(x,y,'.',xaxis,f,'g',xaxis,f+s,'r--',xaxis,f-s,'r--');
ylim([-15 15]);
title('initial state');

disp('Press any key to watch convergence in the data space.')
pause

figure(h);
for i = 1:200
    f = modelfunc(xaxis,full_DEMC_state(idx,1:4,i));
    s = 2.*exp(full_DEMC_state(idx,5,i))*ones(size(xaxis));
    plot(x,y,'.',xaxis,f,'g',xaxis,f+s,'r--',xaxis,f-s,'r--');
    ylim([-15 15]);
    title(['iteration ' num2str(i)]);
    pause(0.07);
end
title('iteration 200 -- finished!')

disp(' ')
disp('Press any key to continue.')
pause

disp(' ')
disp('12 -----------------------------------')
disp('Let''s line up the chains so that we can various calculate ergodic')
disp('averages easily. First we''ll discard the first 100 iterations as')
disp('burn-in.')
disp(' ')
disp('full_DEMC_state = full_DEMC_state(:,:,101:end); % discard burn-in')
disp('time_full_DEMC = 0.5 .* time_full_DEMC; % correct computation time')
disp('reshaped_state = reshape(permute(full_DEMC_state,[3 1 2]),[2000 5]); % reshape sampled states')
disp(' ')
disp('Let''s look at the estimated posterior correlation matrix.')
disp(' ')
disp('corr(reshaped_state)')

full_DEMC_state = full_DEMC_state(:,:,101:end);
time_full_DEMC = 0.5 .* time_full_DEMC;
reshaped_state = reshape(permute(full_DEMC_state,[3 1 2]),[2000 5]);
corr(reshaped_state)

disp(' ')
disp('In typical simulations, the first and second parameters are highly')
disp('correlated, and the third and fourth parameters are also highly')
disp('(negatively) correlated. Other correlations are not large in absolute')
disp('magnitude. The estimated covariance matrix we used to sample initial')
disp('states would also have shown this.') 
disp(' ')
disp('Press any key to continue.')
pause

disp(' ')
disp('13 -----------------------------------')
disp('Like the block Gibbs sampler, the DE-MC sampler works best when the') 
disp('state vectors are sampled in independent blocks. We set the blocks using') 
disp('a cell array of length b in which each cell contains the indexes of state')
disp('vector elements belonging to the b''th block. We will create three')
disp('blocks: the first and second parameters will be a block, the third')
disp('and fourth parameters with be a block, and the fifth parameter on its')
disp('own will be a block.')
disp(' ')
disp('blockindex = {[1 2]; [3 4]; 5};');
disp(' ')
disp('For initial states, we will use the last states sampled of the full')
disp('DE-MC chains.')
disp(' ')
disp('init_state = full_DEMC_state(:,:,end);')
disp(' ')
disp('Since the initial states are taken from converged chains, we can omit')
disp('the burn-in. We will collect 100 samples in each chain for a total of')
disp('2,000 samples.');
disp(' ')

blockindex = {[1 2]; [3 4]; 5};
init_state = full_DEMC_state(:,:,end);

disp('Press any key to begin sampling.')
pause

disp(' ')
disp('14 -----------------------------------')
disp('tic;')
disp('block_DEMC_state = DEMC_sample(logpos, 100, init_state, blockindex);')
disp('time_block_DEMC = toc')
disp(' ');

tic;
block_DEMC_state = DEMC_sample(logpos, 100, init_state, blockindex);
time_block_DEMC = toc

disp(' ')
disp('Press any key to continue.')
pause

disp(' ')
disp('15 -----------------------------------')
disp('With three blocks, the process takes about three times as long per')
disp('sample, but if we look at the autocorrelation functions of the two')
disp('runs, we will (probably) see that the block sampling has faster')
disp('mixing, and is therefore more efficient on a per-sample basis .')

full_ac = zeros(20,99);
block_ac = zeros(20,99);
for i = 1:20
    temp = squeeze(full_DEMC_state(i,4,:));
    temp = xcorr(temp-mean(temp),'coeff');
    full_ac(i,:) = temp(101:end);
    temp = squeeze(block_DEMC_state(i,4,:));
    temp = xcorr(temp-mean(temp),'coeff');
    block_ac(i,:) = temp(101:end);
end

figure(h)
est_full_ac = mean(full_ac);
est_block_ac = mean(block_ac);
plot(0:99,[1 est_full_ac],'.-',0:99,[1 est_block_ac],'.-',[0 30],[0 0],'k--');
title('estimated autocorrelation function (parameter 4)')
xlabel('lag');
legend('full','block');
xlim([0 30])
yl = ylim;
ylim([yl(1) 1]);

disp(' ')
disp('Press any key to continue.')
pause

disp(' ')
disp('16 -----------------------------------')
disp('The efficiency of a Markov chain is the inverse of the sum of the')
disp('autocorrelation function from negative infinity to positive infinity.')
disp('We can get a conservative estimate of this sum through some calculations')
disp('based on our sampled chains. (You can inspect the code of the demo to')
disp('see how I''ve done this.)')

efficiency_full = (1 + 2.*max(cumsum(est_full_ac))).^-1;
efficiency_block = (1 + 2.*max(cumsum(est_block_ac))).^-1;

time_efficiency_full = efficiency_full.*2000./time_full_DEMC;
time_efficiency_block = efficiency_block.*2000./time_block_DEMC;

disp(' ')
disp('Efficiency on a per-sample basis:')
disp(['full DE-MC: ~' num2str(0.1*round(10.*(efficiency_full.^-1))) ' MCMC samples is equivalent to one independent sample from the posterior'])
disp(['block DE-MC: ~' num2str(0.1*round(10.*(efficiency_block.^-1))) ' MCMC samples is equivalent to one independent sample from the posterior'])
disp(' ')
disp('efficiency on a per-unit-time basis:')
disp(['full DE-MC: ~' num2str(round(time_efficiency_full)) ' independent-equivalent samples per second'])
disp(['block DE-MC: ~' num2str(round(time_efficiency_block)) ' independent-equivalent samples per second'])
disp(' ')
disp('In typical simulations, the full sampling approach outperformed the')
disp('block sampling approach on a per-unit-time basis in spite of the fact')
disp('that block sampling led to faster mixing. In general, the trade-off')
disp('between computation time and speed of mixing will depend on the target')
disp('density being analyzed and the chosen block structure.')
disp(' ')
disp('This concludes the demo.')

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
function H = internal_hessian(func, x, increment)
% HESSIAN  
%  
%   H = hessian(func, x, increment) 
%
%   This method computes the Hessian matrix at x for a given function 
%   taking a single vector argument. The partial derivatives are computed
%   numerically, using a simple two-point central difference approximation.
%   (Since these are second derivatives, each mixed derivative actually 
%   requires 4 function evaluations.) The function is assumed to be  
%   continuous in the neighbourhood of x, which implies that the Hessian 
%   is symmetric. The default increment value is 5e-5.

% Copyright notice:
% This code is free software, as defined by the Free Software Foundation.
% Users may modify and redistribute this code; the sole restrictions are
% that (i) the source code must accompany compiled distributions of this 
% code, and (ii) this notice must remain unchanged.

%% AUTHOR    : Corey Yanofsky 
%% $DATE     : 26-Sep-2005 16:41:36 $ 
%% $Revision : 1.00 $ 
%% DEVELOPED : 7.0.4.365 (R14) Service Pack 2 
%% FILENAME  : hessian.m 
 
if(nargin < 3)
    increment = 5e-5;
end

x = x(:);

% pre-allocate arrays
fx_plus_two_inc = zeros(size(x));
fx_minus_two_inc = zeros(size(x));
I = eye(length(x));

% function evaluations for the diagonal elements
fx = func(x);
for i = 1:length(x)
    fx_plus_two_inc(i) = func(x + 2.*I(:,i).*increment);
    fx_minus_two_inc(i) = func(x - 2.*I(:,i).*increment);
end

% calculate diagonal values of the Hessian
H = diag(fx_plus_two_inc + fx_minus_two_inc - 2.*fx);

% calculate off-diagonal values
for a = 1:(length(x) - 1)
     for b = (a+1):length(x)
         H(a,b) = func(x + I(:,a).*increment + I(:,b).*increment)...
             + func(x - I(:,a).*increment - I(:,b).*increment)...
             - func(x + I(:,a).*increment - I(:,b).*increment)...
             - func(x - I(:,a).*increment + I(:,b).*increment);
         H(b,a) = H(a,b);
     end
end

% scale the matrix 
H = H./(4.*(increment.^2));
        
% Created with NEWFCN.m by Frank Gonzlez-Morphy  
% Contact...: frank.gonzalez-morphy@mathworks.de  
% ===== EOF ====== [hessian.m] ======  

Contact us at files@mathworks.com