Generate a pseudo-random vector X drawn from the truncated Gaussian distribution
% function X = TruncatedGaussian(sigma, range)
% X = TruncatedGaussian(sigma, range, n)
Generate a pseudo-random vector X of size n, X are drawn from the truncated Gaussian distribution in a RANGE braket; and satisfies std(X)=sigma.
RANGE is of the form [left,right] defining the braket where X belongs. For a scalar input RANGE, the braket is [-RANGE,RANGE].
If input SIGMA is negative, X will be forced to have the same "shape" of distribution function than the unbounded Gaussian with standard deviation -SIGMA: N(0,-SIGMA). It is similar to calling RANDN and throw away values ouside RANGE. In this case, the standard deviation of the truncated Gaussian will be different than -SIGMA. The *effective* mean and the effective standard deviation can be obtained by calling:
[X meaneffective sigmaeffective] = TruncatedGaussian(...)
1.3 | Use asymptotic formula for unbalanced range to avoid round-off error issue |
Z (view profile)
@pete, there is no confusion:
the code fails to work in the tails.
Using either sigma, or -sigma both fail:
TruncatedGaussian(-1, [39,Inf]) % clearly the answer should not be Inf
ans =
Inf
TruncatedGaussian(-1, [39,50]) % the answer is wrong and not random
ans =
50
TruncatedGaussian(-1, [-Inf,-10]) % answer is not random and again wrong
ans =
-10
The problems is that you test it in a range, where the approximation to
"erfcinv" is accurate, but fail to test it in the difficult
cases in the tail, where it crashes.
This is a well known
problem, see for example the comments by Prof Robert for an R version using
the inverse transform method:
https://xianblog.wordpress.com/2013/04/09/painful-truncnorm/
Z (view profile)
pete (view profile)
I can see @Z 's confusion, but the script is correct, albeit a little confusing.
Enter -sigma to truncate an underlying normal distribution with STD=sigma (effective STD of result < sigma)
Enter sigma to produce a truncated distribution with STD=sigma (effective STD of result = sigma)
Personally I would favour the (non-default) former version, of -sigma.
Performance was evaluated using:
% init
clear all
close all
clc
% user variables
mu = 2;
n = 1000000;
sigma=2;
range=[-4 4];
% 1. crude resampling
x1 = normrnd(0,sigma,n,1);
while 1
idx = x1<range(1) | x1>range(2);
if sum(idx)==0
break;
else
x1(idx) = normrnd(0,sigma,sum(idx),1);
end
end
x1 = x1+mu;
% 2. generate UNIFORM random samples in an appropriate interval, then
% invertthe normal CDF at those points
p = normcdf(range(1),0,sigma);
U = rand(n,1)*(1-2*p) + p;
x2 = norminv(U)*sigma+mu;
% 3. FEX function
x3a = TruncatedGaussian(sigma, range, [1 n])+mu;
x3b = TruncatedGaussian(-sigma, range, [1 n])+mu;
% plot
nBins = 32;
figure();
subplot(4,1,1); hist(x1,nBins);
subplot(4,1,2); hist(x2,nBins);
subplot(4,1,3); hist(x3a,nBins);
subplot(4,1,4); hist(x3b,nBins);
% print to console
fprintf('x1: mean=%1.2f std=%1.2f\n', mean(x1), std(x1));
fprintf('x2: mean=%1.2f std=%1.2f\n', mean(x2), std(x2));
fprintf('x3a: mean=%1.2f std=%1.2f\n', mean(x3a), std(x3a));
fprintf('x3b: mean=%1.2f std=%1.2f\n', mean(x3b), std(x3b));
Z (view profile)
Code yields wrong results in the tails of the normal. For example,
sigma=2;
range=[150 160]
[X meaneff sigmaeff] = TruncatedGaussian(-sigma, range, [1 1e6]);
yields the incorrect random realizations of (exactly!) 160, when in fact a correctly drawn relization is 150.0204
Alex (view profile)
Dear Bruno,
Thanks for sharing this. I noticed that your code uses numerical methods to estimate the effective mean and std of the truncated random variable. Closed-form expressions for these quantities are given by Barr & Sherrill (1999, American Statistician, 53, 357-361), "Mean and Variance of Truncated Normal Distributions." The PDF is available for download from JSTOR: http://www.jstor.org/stable/2686057 I cannot upload the PDF here but will be happy to email it to you if you don't have access to JSTOR. The closed-form solution for the effective variance depends on the ch2cdf function in the Statistics Toolbox.
Another relevant paper is Babu & Mathew (2009, Statistics and Probability Letters, 79, 375-380) "Confidence intervals for limited moments and truncated moments in normal and lognormal moments."
Thanks again for sharing.
-- Alex Petrov
http://alexpetrov.com
mortain (view profile)
Ciao Bruno,
thanks for the code you've written. I have a similar doubt as Ricardo. I'd assume the code would divide the domain of the "range" in a number of points equal to the number of points specified in the command "n", TruncatedGaussian(sigma, [range], [n]);
However, the values in X are defined randomly, hence the mean estimated will always be different, even though the range and sigma are the same
>> range=[-3 3];
>> sigma=1;
>> [X meaneff sigmaeff] = TruncatedGaussian(sigma, range, [1 1e3]);
>> fprintf('mean(X)=%g, estimated=%g\n',meaneff, mean(X))
mean(X)=0, estimated=0.0555218
>> fprintf('sigma=%g, effective=%g, estimated=%g\n', sigma, sigmaeff, stdX)
sigma=1, effective=1, estimated=0.992506
>> [X meaneff sigmaeff] = TruncatedGaussian(sigma, range, [1 1e3]);
>> fprintf('mean(X)=%g, estimated=%g\n',meaneff, mean(X))
mean(X)=0, estimated=0.00445842
>> fprintf('sigma=%g, effective=%g, estimated=%g\n', sigma, sigmaeff, stdX)
sigma=1, effective=1, estimated=0.992506
This can be seen as well from hist, using same number of divisions.
Van I ask you if I correctly got your coding, and how to obtain an uniform division of the domain.
Thanks
Bruno Luong (view profile)
@Raymond, you still do not elaborate what is "truncated standard normal random variable". I can only guess:
x = TruncatedGaussian(-1,[2 500],[1 1e6]);
Beside that you need to read the help of TruncatedGaussian
Raymond (view profile)
Dear Bruno,
I'm very sorry not to have been more precise. I wanted to generate a truncated standard normal random variable, truncated to the interval [2,500]. Is this possible?
Raymond
Bruno Luong (view profile)
Raymond, there exist no random variable in [2 500] with mean = 0. The mean must be in [2, 5000] regardless the distribution. Thus what you ask is mathematically impossible.
Raymond (view profile)
Dear Bruno,
How do a generate from the normal distribution with mean zero and standard deviation 1 a truncated normal on the range [2 500]? I calculate the this random variable has a standard deviation of 0.3381. When I try [X meaneff sigmaeff] = TruncatedGaussian(0.3381, [2 500], 1) I get an error message.
Bruno Luong (view profile)
Please note the typo in the last line in my comment from August 26th 2011
fprintf('max(X) = %f\n', max(X));
Bruno Luong (view profile)
Mean effective, sigma effective are the real mean and standard deviation of the random variable after truncation. The word "effective" is to make the distinction with the mean and standard deviation of the full Gaussian variable BEFORE the truncation.
milad babaei (view profile)
tnx bruno,what does "meaneffective sigmaeffective "mean?FOR example: what's the definition of this:
mean(C)=623.066, estimated=624.239
sigma=147.65, effective=91.8246, estimated=90.8613
Bruno Luong (view profile)
The effective mean of the truncated Gaussian changes with the range. To get the effective mean of the date, call in two outputs form:
>> [X meaneffective] = TruncatedGaussian(sqrt(4.5), [1 inf], [1,1e6]);
>> meaneffective
meaneffective =
3.7034
>> mean(X)
ans =
3.7016
>> [X meaneffective] = TruncatedGaussian(sqrt(6.5), [1 inf], [1,1e6]);
>> meaneffective
meaneffective =
4.2673
>> mean(X)
ans =
4.2684
HINT: to get the mean you aim for, use FZERO in this function:
function deltamean = meanTG(xbar, meantarget, stdtarget, range)
[~, meaneffective] = TruncatedGaussian(stdtarget, range-xbar, 0);
deltamean = meantarget - (meaneffective + xbar);
end % meanTG
% Call fzero
meantarget = 6;
stdtarget = 3;
range = [1 Inf];
xbarstart = 10;
xbar = fzero(@(xbar) meanTG(xbar, meantarget, stdtarget, range), xbarstart)
% Create a shift TrunccatedGaussian
X = xbar + TruncatedGaussian(stdtarget, range-xbar, [1 1e6]);
% Check
fprintf('mean(X) = %f\n', mean(X));
fprintf('std(X) = %f\n', std(X));
fprintf('min(X) = %f\n', min(X));
fprintf('max(X) = %f\n', mean(X));
Ricardo Giglio (view profile)
Hi Bruno, I'd like to generate two series with same mean but different variances as described below, but I always get different means... Could you help me understanding why? Thanks
A_bar = 1.5; A_var = 4.5;
B_bar = 1.5; B_var = 6.5;
A = A_bar + TruncatedGaussian(sqrt(A_var), [1 inf], 10);
B = B_bar + TruncatedGaussian(sqrt(B_var), [1 inf], 10);
mean(A(:))
mean(B(:))
ans =
4.9641
ans =
5.6081
Bruno Luong (view profile)
Zack,
first question: see my comment #2 above.
second question: yes the function accept infinity as range.
Zack (view profile)
Hi bruno
Can you please tell me how to use your function for a truncated gaussian with mean mu non zero a standard deviation sigma and a range [-1 1]?
is it possible to extend this function for a [a b] with a<>b and maybe a or b equal to infinity?
Thanks
Bruno Luong (view profile)
Midal, the RANGE second argument of TruncatedGaussian can be used for bound.
milad babaei (view profile)
Hi Bruno,is that possible to change the bound in a code and generate random numbers??for example in this zone n=[1, 1e4].
Manu (view profile)
thanks!
Ling Xue (view profile)
what does "meaneffective sigmaeffective "mean? We need to need the mean and variance before generating Gaussian random variable, right? It seems that meaneffective is changing even the range and sigma remains the same.
Bruno Luong (view profile)
Hi Bree, not in my intention.
Bree (view profile)
Hi Bruno,
Will you generalize this code into the multivariate one?
Trying to generate this kind of numbers, but in high dimensionality. Say something n = [128, 1e6] (rather than n=[1, 1e6].
summersyu Yu (view profile)
Dear Bruno,
Thank you for your answer. Following is my understanding:
[X meaneffective sigmaeffective] = TruncatedGaussian(...)
If meaneffective is 5, but I want the mean to be 15.
Then shift=10
X = shift + TruncatedGaussian(sigma, range-shift, ...)
This is how we generate random variables from truncated normal distribution with mean 15, variance sigmaeffective and range [a,b].
But what is the CDF of a random variable from truncated normal distribution?
Let y be the cdf of a truncated normal variable x, then is the following equation right? I am not sure of the mean and variance in normcdf
y = (normcdf(x) - normcdf(a)) ./ (normcdf(b) - normcdf(a))
Ze Dagher (view profile)
Thanks Bruno. This turned out to be the issue in the second problem I described. The output now matches derivations.
Bruno Luong (view profile)
Thanks, Ze I made some correction to the code. The new version is available.
Ze Dagher (view profile)
Bruno,
your code has one problem. Enter this command and see what happens:
Y = TruncatedGaussian(-1,[10 20],[1 1000]);
All the values returned are 20s. I know that 10 to 20 is a very unlikely range with a distribution of N(0,1) but still, I am just showing here that I noticed that your code returns wrong values sometimes.
Another (possibly related) problem is when you try and generate a distribution with non-zero mean with the range being a bit far from the mean. The histogram of the returned values does not match a trusted method I used. I am not sure what this is about.
Bruno Luong (view profile)
To change the mean, you might try this:
shift = something
X = shift + TruncatedGaussian(sigma, range-shift, ...)
summersyu Yu (view profile)
How to change the mean?