Code covered by the BSD License  

Highlights from
BMS toolbox for Matlab: Bayesian Model Averaging (BMA)

image thumbnail

BMS toolbox for Matlab: Bayesian Model Averaging (BMA)

by

 

08 Nov 2010 (Updated )

Do Bayesian Model Averaging (BMA) via a hidden instance of R (Windows only).

bms(X_data, burn, iter, nmodel, mcmc, g, mprior, mprior_size, user_int, logfile)
function bms = bms(X_data, burn, iter, nmodel, mcmc, g, mprior, mprior_size, user_int, logfile)
% Bayesian Model Sampling and Averaging
%function bms = bms(X_data, burn, iter, nmodel, mcmc, g, mprior, mprior_size, user_int)
%
% Note that defaults of optional inputs may be obtained by passing an empty value (e.g. '') or NaN
%INPUTS:
% * X_data: 
%     A numeric matrix, with the dependent variable in the first column, followed by the covariates. 
%     Alternatively, a Matlab dataset object with numerical data (and the
%       dependent variable in the first column)
%     Alternatively, to process variable names, a structure with the
%       following fields: required '.data': a numerical matrix with the dependent variable in the first column
%       optionally either '.Properties.VarNames' and/or '.Properties.ObsNames': cell
%       string vectors holding variable and observation names (must have
%       lengths according do the dimensions of .data)
%    Note that bms automatically estimates a constant, therefore including constant terms is not necessary.
% * burn (optional): The (positive integer) number of burn-in draws for the MC3 sampler, defaults to 1000. (Not taken into account if mcmc="enumerate")
% * iter (optional): If mcmc is set to an MC3 sampler, then this is the number of iteration draws to be sampled (ex burn-ins), default 3000 draws.
%      If mcmc="enumerate", then iter is the number of models to be sampled, starting from 0 (defaults to 2^K-1) - cf. start.value.
% * nmodel (optional): the number of best models for which information is stored (default 500). Best models are used for convergence analysis between likelihoods and MCMC frequencies, as well as likelihood-based inference.
%      Note that a very high value for nmodel slows down the sampler significantly. Set nmodel=0 to speed up sampling (if best model information is not needed).
% * mcmc (optional): a character denoting the model sampler to be used.
%      The MC3 sampler mcmc="bd" corresponds to a birth/death MCMC algogrithm. mcmc="rev.jump" enacts a reversible jump algorithm adding a "swap" step to the birth / death steps from "bd".
%      Alternatively, the entire model space may be fully enumerated by setting mcmc="enumerate" which will iterate all possible regressor combinations (Note: consider that this means 2^K iterations, where K is the number of covariates.)
%      Default is full enumeration (mcmc="enumerate") with less then 15 covariates, and the birth-death MC3 sampler (mcmc="bd") with 15 covariatess or more. Cf. section 'Details' for more options.
% * g (optional): the hyperparameter on Zellner's g-prior for the regression coefficients.
%      g="UIP" corresponds to g=N, the number of observations (default);
%      g="BRIC" corresponds to the benchmark prior suggested by Fernandez, Ley and Steel (2001), i.e g=max(N, K^2), where K is the total number of covariates;
%      g="RIC" sets g=K^2 and conforms to the risk inflation criterion by George and Foster (1994)
%      g="HQ" sets g=log(N)^3 and asymptotically mimics the Hannan-Quinn criterion with C_{HQ}=3 (cf. Fernandez, Ley and Steel, 2001, p.395)
%      g="EBL" estimates a local empirical Bayes g-parameter (as in Liang et al. (2008));
%      g="hyper" takes the 'hyper-g' prior distribution (as in Liang et al., 2008) with the default hyper-parameter a set such that the prior expected shrinkage factor conforms to 'UIP';
%      This hyperparameter a can be adjusted (between 2<a<=4) by setting g="hyper=2.9", for instance.
%      Alternatively, g="hyper=UIP" sets the prior expected value of the
%      shrinkage factor equal to that of UIP (default), g="hyper=BRIC" sets it according to BRIC
% * mprior (optional): a character denoting the model prior choice, defaulting to "random":
%      mprior="fixed" denotes fixed common prior inclusion probabilities for each regressor as e.g. in Sala-i-Martin, Doppelhofer, and Miller(2004) - for their fine-tuning, cf. mprior.size. Preferable to mcmc="random" if strong prior information on model size exists;
%      mprior="random" (default) triggers the 'random theta' prior by Ley and Steel (2008), who suggest a binomial-beta hyperprior on the a priori inclusion probability;
%      mprior="uniform" employs the uniform model prior;
%      Note that the prior on models with more than N-3 regressors is automatically zero: these models will not be sampled.
% * mprior_size (optional): if mprior is "fixed" or "random", mprior.size is a
%      scalar that denotes the prior expected value of the model size prior (default K/2).
% * user_int (optional): 'interactive mode': print out results to console after ending the routine and plots a chart (default TRUE). 
% * logfile (optional): if true, print out results to the logfile 'BMSlogfile.html' in a BMS toolbox subfolder; 
%      if user_int==true, then this logfile automatically displays in the Matlab web browser
%      Default: false;
%
%OUPUT:
% A structure pointing to a 'bma' object in R, that may be displayed using
%    e.g. summary_bma  or print_bma
%
%EXAMPLE
% load datafls
% mm=bms(datafls);
% mm=bms(datafls,'','',20);
% 
%SEE ALSO
% coef_bma, density_bma, plotConv, ...
%
% Equivalent function in R: bms
% Author: Stefan Zeugner 2010 
% Url: http://bms.zeugner.eu

if not(isBMSopen(false)); end;

if nargin<1
   error('X_data is missing');    
end
 
if nargin<2; burn=NaN; end; if isempty(burn); burn=NaN; end
if nargin<3; iter=NaN; end; if isempty(iter); iter=NaN; end
if nargin<4; nmodel=NaN; end; if isempty(nmodel); nmodel=NaN; end
if nargin<5; mcmc=NaN; end; if isempty(mcmc); mcmc=NaN; end
if nargin<6; g=NaN; end; if isempty(g); g=NaN; end
if nargin<7; mprior=NaN; end; if isempty(mprior); mprior=NaN; end
if nargin<8; mprior_size=NaN; end; if isempty(mprior_size); mprior_size=NaN; end
if nargin<9; user_int=NaN; end; if isempty(user_int); user_int=NaN; end
if nargin<10; logfile=NaN; end; if isempty(logfile); logfile=NaN; end



[X_struct, burn, iter, nmodel, mcmc, g, mprior, mprior_size, user_int, logfile] = bms_checkargs(X_data, burn, iter, nmodel, mcmc, g, mprior, mprior_size, user_int, logfile);
X_data=X_struct.data;


if ~isnan(g)
if ~isstr(g); g=num2str(g); else g=['''', g, '''']; end;
end

if ~isnan(mprior_size); mprior_size=num2str(mprior_size); end;

if logfile; 
    logfiledir=BMStoolboxPath;
    logfiledir= [ logfiledir 'other/'];
    if user_int
        web(['file:///' logfiledir 'tailer_simple.html' ]);
    end
end;


nowdt=datevec(now);
bmasuffix=['_' num2str(nowdt(5)) '_' num2str(nowdt(6))];

putRdata(['data' bmasuffix],X_data);
bmscmd=['bma' bmasuffix ' = bms(data' bmasuffix ', ']; 
if ~isnan(burn); bmscmd=[bmscmd, ' burn=' num2str(burn) ',']; end
if ~isnan(iter); bmscmd=[bmscmd, ' iter=' num2str(iter) ',']; end
if ~isnan(nmodel); bmscmd=[bmscmd, ' nmodel=' num2str(nmodel) ',']; end
if ~isnan(mcmc); bmscmd=[bmscmd, ' mcmc=''' mcmc ''',']; end
if ~isnan(g); bmscmd=[bmscmd, ' g=' g ',']; end
if ~isnan(mprior); bmscmd=[bmscmd, ' mprior=''' mprior ''',']; end
if ~isnan(mprior_size); bmscmd=[bmscmd, ' mprior.size=' mprior_size ',']; end
if logfile; 
    bmscmd=[bmscmd, ' logfile=''' logfiledir 'bmsLogfile.html'' ,']; 
 end
bmscmd = [bmscmd, ' user.int=FALSE)'];   
    
evalinR(bmscmd);
evalinR(['coefbma' bmasuffix ' = estimates.bma(bma' bmasuffix ')' ]);
evalinR(['summarybma' bmasuffix ' = info.bma_matlab(bma' bmasuffix ')' ]);
coefb=getRdata(['coefbma' bmasuffix]);
sumb=getRdata(['summarybma' bmasuffix]);
if iscell(coefb); coefb=rcell2mat(coefb); end;


bms=struct;
bms.suffix = bmasuffix;
bms.coef = coefb;
bms.summary = sumb;
bms.Rcmd= bmscmd;
bms.data=X_struct;


if user_int 
  print_bma(bms);
  plotConv(bms);
  if logfile
        web(['file:///' logfiledir 'bmsLogfile.html' ]);
  end      
end



Contact us