Is there a bug in Multidimensional Input to mhsample?

9 views (last 30 days)
Robert on 24 Nov 2014
Answered: Erik Hauri on 25 Jan 2018
I was trying to sample from a multidimensional pdf using `mhsample` but was getting the error
Index exceeds matrix dimensions.
Error in mhsample (line 174)
x0(acc,:) = y(acc,:); % preserves x's shape.
However when I edited line 174 to
x0(:,acc) = y(:,acc); % preserves x's shape.
The function appears to have worked.
Some sample code to try:
start = zeros(1,40);
delta = .5;
proppdf = @(x,y) unifpdf(y-x,-delta,delta);
proprnd = @(x) x + rand(1,40)*2*delta - delta;
pdf_log = @(x) sum(log(mvnpdf(x, zeros(1,40), eye(40))));
smpl = mhsample(start,100,'logpdf',pdf_log,'proppdf',proppdf, 'proprnd',proprnd);
Is this a bug in the multidimensional implementation of mhsample or have I missed something?

Tom Lane on 17 Dec 2014
It looks like there's something that could be improved. The error message was no help. I think you want to specify a multivariate proposal distribution, not the univariate uniform one you have. Maybe this:
proppdf = @(x,y) prod(unifpdf(y-x,-delta,delta),2);
Erik Hauri on 23 Jan 2018
Its not about proppdf - Robert suggests a row-column mixup. (:,acc) instead of (acc,:) which makes sense since acc is given in the 2nd column when working in only 1 dimension (start is a scalar).

Erik Hauri on 23 Jan 2018
I have observed the same. Taking the 2nd example in the documentation (sampling data from a standard normal distribution), after making your modification to mhsample - if start and delta are provided as row vectors, then the calculation works and results in x are given as column vectors.

Don Mathis on 24 Jan 2018
In the posted code, the problem is that the proposal distribution function 'proppdf' does not return a scalar probability density as required. Instead it returns a vector.
According to the documentation (<http://www.mathworks.com/help/stats/mhsample.html)>,
" proppdf takes two arguments as inputs with the same type and size as start. ... The proposal distribution q(x,y) gives the probability density for choosing x as the next point when y is the current point."
In the posted code, ‘start’ has size [1,40]. If I pass two vectors of that size to proppdf I do not get the required scalar. Instead I get a vector:
>> rng(0);
>> start = rand(1,40);
>> proppdf(start, 2*start)
ans =
Columns 1 through 17
1 1 0 1 0 1 0 0 1 1 0 1 1 1 1 1 0
Columns 18 through 34
0 0 1 1 0 1 1 0 0 1 0 1 0 0 1 0 0
Columns 35 through 40
1 1 1 1 0 0
proppdf was defined like this:
proppdf = @(x,y) unifpdf(y-x,-delta,delta);
But unifpdf is a univariate pdf, and interprets “y-x” as a row vector of scalar x’s, not a single multivariate x.
Maybe you intended to combine the 40 components of x to get the multivariate uniform density. If the components are independent, that’s just the product. If we do that, we get valid sampling:
Here is an editied version of the code that draws 1000 samples and plots the chain:
start = zeros(1,40);
delta = .5;
proppdf = @(x,y) prod(unifpdf(y-x,-delta,delta));
proprnd = @(x) x + rand(1,40)*2*delta - delta;
pdf_log = @(x) sum(log(mvnpdf(x, zeros(1,40), eye(40))));
smpl = mhsample(start,1000,'logpdf',pdf_log,'proppdf',proppdf, 'proprnd',proprnd);
figure;
plot(smpl);
title('1000 samples');
We can also verify that the proposal distribution was used correctly. We can see that no two consecutive points differed by more than 0.5 on any component, which seems to be the intent of the proposal distribution:
>> max(max(abs(diff(smpl))))
ans =
0.5000

Erik Hauri on 24 Jan 2018
No, I think the original intention (and this was my intention) was that x and y were vectors of scalars, and that proppdf should thus produce a vector of the same dimension as x. In the above application each element of the vector 'start' would be an initial value for the MH routine, with its corresponding values of y (and proppdf, proprnd, etc). In this way, proppdf (and thus unifpdf) should not produce a single scalar, but rather a vector of scalars corresponding the value of its associated array element in 'start'.
The code below I think shows that in unifpdf(y-x,A,B) the function unifpdf properly identifies the quantity (y-x) as the difference between elements of the vectors y and x.
y=1:0.1:1.9; >> x=0.1:0.1:1; >> A=0.1*ones(1,10); >> B=2:1:11; >> unifpdf(y-x,A,B)
ans =
0.5263 0.3448 0.2564 0.2041 0.1695 0.1449 0.1266 0.1124 0.1010 0.0917
>> xx=x'; >> yy=y'; >> AA=A'; >> BB=B'; unifpdf(yy-xx,AA,BB)
ans =
0.5263
0.3448
0.2564
0.2041
0.1695
0.1449
0.1266
0.1124
0.1010
0.0917
Using prod(unifpdf(......)) produces a single scalar, not a vector as required for a multidimensional MH routine.
Don Mathis on 25 Jan 2018
Ok, are you saying that you would like to run 40 univariate chains in a single call to mhsample? You're sampling from a univariate distribution? If so, here is how you can do that:
% Run concurrent univariate chains:
nsamples = 1000;
nchain = 40; % Number of chains to run.
start = zeros(nchain,1); % A column vector of starting points.
delta = .5;
proppdf = @(x,y) unifpdf(y-x,-delta,delta); % This must return a column vector.
proprnd = @(x) x + rand(nchain,1)*2*delta - delta; % This must return a column vector.
pdf_log = @(x) log(normpdf(x, 0, 1)); % This must return a column vector.
smpl = mhsample(start,nsamples,'logpdf',pdf_log,'proppdf',proppdf, 'proprnd',proprnd, ...
'nchain', nchain); % Pass 'nchain' to run multiple concurrent chains.
% smpl is a 3D array of size [nsamples nvars nchain]
size(smpl)
% Plot chains 13 and 25 on the same plot
figure;
hold on
plot(smpl(:,1,13),'r');
plot(smpl(:,1,25),'b');
legend('Chain 13','Chain 25')
% Plot all chains on the same plot
figure;
plot(squeeze(smpl));
title('All chains');
Some notes:
1. The 'start' argument needs to have one row per chain. Since we're sampling from a univariate distribution, 'start' has only one column.
2. Each of the 3 function handles must return a column vector (one value per chain).
3. You need to pass the 'nchain' argument to mhsample.
Do I understand correctly that this is what you're looking for?

Don Mathis on 25 Jan 2018
We now have on this page examples of how to run a single multivariate chain , and how to run multiple univariate chains . Here is how you can run multiple multivariate chains:
%%Multiple multivariate chains
nsamples = 1000; % Number of samples per chain.
nchain = 50; % Number of chains to run.
dim = 3; % Dimensionality of the random variable.
start = zeros(nchain,dim); % A matrix of starting points.
delta = .5;
proppdf = @(x,y) prod(unifpdf(y-x,-delta,delta),2); % This must return a [nchain 1] vector.
proprnd = @(x) x + rand(nchain,dim)*2*delta - delta; % This must return a [nchain dim] matrix.
pdf_log = @(x) log(mvnpdf(x, zeros(1,dim), eye(dim))); % This must return a [nchain 1] vector.
smpl = mhsample(start,nsamples,'logpdf',pdf_log,'proppdf',proppdf, 'proprnd',proprnd, ...
'nchain', nchain); % Pass 'nchain' to run multiple concurrent chains.
% smpl is a 3D array of size [nsamples nvars nchain]
size(smpl)
% Plot the components of chain 13 on a single plot
figure;
hold on
plot(smpl(:,:,13));

Erik Hauri on 25 Jan 2018
Ah OK - use nchain and dimension 'start' as a column vector rather than a row vector. This makes sense, thank you!

Categories

Find more on Frequency-Domain Analysis in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!