how to generate random numbers with constraints?

53 views (last 30 days)
Hi, I want to generate 24 random numbers whose sum is limited between [3550 3650]. in addition each number must be between [5 800].
here is my try:
ub=800*ones(1,24);
lb=5*ones(1,24);
a=lb+rand(1,24).*(ub-lb);
while sum(a)>3650 || sum(a)<3550
a=lb+rand(1,24).*(ub-lb);
end
but it stucks in the loop forever. please help me...
  6 Comments
John D'Errico
John D'Errico on 29 Apr 2019
The data that I generated IS uniform. It is uniform over the domain of the triangular region where the data lives. It is uniform over the set that satifies the sum constraint.
The marginal distribution has a triangular PDF. It is NOT a gamma. There is no way that you can have data that is BOTH uniform over the constraint domain, as well as being uniform in any of the individual variables.
If you are looking for data that is truly uniform in each of the variables, yet has a sum that falls within some bounds? You can't have that.
Consider the example I gave. I chose two variables, x and y, uniform over the interval [0,1]. Now, look at the sum x+y. Can the variables be truly independently uniform over the domain, yet have a sum that is less than 1? So I'm not sure what you are asking to produce anymore. What is the final goal here?
alex brown
alex brown on 30 Apr 2019
I tested so many conditioned random numbers. I think you are right.
In fact when we choose the first number we limit the second number interval ,etc. Therfore the next number distribution can't be uniform
thank you

Sign in to comment.

Accepted Answer

John D'Errico
John D'Errico on 29 Apr 2019
Edited: John D'Errico on 29 Apr 2019
This is not simple, if you want to find a truly uniform set. In 24 dimensions, things get fairly complicated. And the domain the numbers can live in is actually pretty wide, thus the interval [5,800].
So, if you have all numbers at or near their mininum,
5*24
ans =
120
>> 800*24
ans =
19200
>> (5 + 800)/2*24
ans =
9660
The point being that it will be a quite rare event that 24 numbers, chosen randomly from that interval, will have a sum in the interval [3550,3650].
The law of large numbers tells us that the sum of that set will be effectively normally distributed. And, in fact, this gives us a solution to the problem!
Lets see. Each member of that set will be uniform, on the interval [5,800]. That means the mean of x_i will be
a = 5;
b = 800;
(a + b)/2
ans =
402.5
And the variance (of each number in the set) will be
(b-a)^2/12
ans =
52668.75
So now the sum of 24 of them will be fairly accurately normally distributed, with mean, variance, and standard deviation:
sumMean = (a+b)/2*24
xmean =
9660
sumVar = (b-a)^2/12*24
xvar =
1264050
sumStd = sqrt(sumVar)
xstd =
1124.29978208661
So, now what are the odds that you will find a sum in the range from [3550,3650]? That would be found from a cumulative normal.
sumLimits = [3550,3650];
normcdf(sumLimits,sumMean,sumStd)
ans =
2.74761306010529e-08 4.50716095152589e-08
diff(ans)
ans =
1.7595478914206e-08
Hmm. So you will probably need to generate roughly 1e8 sets of 24 uniform numbers before you find a single set that lives in the interval [3550,3650].
But! As I said, this actually gives you a near trivial solution! How? We break the problem down into two simpler problems. Hard problems are hard. Easy problems are way easier. So lets create two simpler problems.
First, can we sample from a truncated normal distributiuon? That part is easy.
norminv(sumNormProbLimits(1) + rand()*diff(sumNormProbLimits),sumMean,sumStd)
ans =
3637.00633112408
As you can see, that is a number that has atuncated normal distribution, with the desired parameters. So now we can just use randfixedsum. I'll put it all together here:
a = 5;
b = 800;
ndim = 24;
sumMean = (a+b)/2*ndim;
sumVar = (b-a)^2/12*ndim;
sumStd = sqrt(sumVar);
sumLimits = [3550,3650];
sumNormProbLimits = normcdf(sumLimits,sumMean,sumStd);
S = norminv(sumNormProbLimits(1) + rand()*diff(sumNormProbLimits),sumMean,sumStd);
randfixedsum(24,1,S,a,b)
ans =
32.6666351919248
226.171574450247
26.4486244673284
59.5273902733376
105.140425145937
198.085619814426
313.588691531256
125.527412419838
222.108869160278
78.6355991625287
116.832965448386
25.2315920517267
10.7762636314948
295.321051677379
339.225232904641
91.7004913938697
29.5858150543207
184.827862803465
37.7525635267701
115.866539562388
512.657889835195
372.544944227602
13.6087483202661
58.868132270981
So, a set of 24 uniformly distributed random numbers that have their sum in the desired interval. The numbers really are uniformly distributed to lie in the interval [5,800], but the condition that the sum be so relatively small, makes it unlikely that any of them are near the maximum. On this random set, the max was as large as 512. That seems reasonable to me, under the conditional sampling that was required.
sum(ans)
ans =
3592.70093432559
It was really pretty easy to do. I did need randfixedsum, which you can get from the File Exchange for free download.
  2 Comments
alex brown
alex brown on 29 Apr 2019
thank you for your time John D'Errico, it was really helpful. I had doubt about randfixedsum because I didn't know about the solutions distribution. I need the answers to be uniformly distributed. I have to run the code for 100 times and analyze the distribution of the 24*100 answers to find out the distribution of them.
John D'Errico
John D'Errico on 29 Apr 2019
Ok, then you are a little better off using my solution, though the difference is not that huge.
For example, in my solution, you should expect to see more samples with a sum near 3650 than near 3550. The differential (based on the PDF at those points) should be not quite twice as often. In Rik's solution, the two ends of that interval on the sum will be equally likely. Is this s huge difference? Perhaps not. It very much depends on what you are doing with the results. Since you are actually testing things to look at the distributions, you will prefer the more technically correct solution.
A very important thing to note however, is that a sample size of only 100 such sets is tiny. TINY. MINISCULE. In fact, when you are looking at things like this, I would be looking at millions of such samples at the least.

Sign in to comment.

More Answers (1)

Rik
Rik on 29 Apr 2019
Edited: Rik on 29 Apr 2019
The reason for your infinite loop is that you need an average random value of 150, which is far from the expected value of 402.5. That means you need quite an extreme situation to land on your required vector.
You can use randfixedsum from the FEX. This function takes a more mathematical approach, which means that it doesn't need to reject vectors that don't satisfy the constraints.
%output matrix size
n=24;m=1;
%generate a random sum between [3550 3650]
s_bounds=[3550 3650];s=rand*diff(s_bounds)+s_bounds(1);
%bounds of values themselves
a=5;b=800;
%generate the values
x = randfixedsum(n,m,s,a,b);
%transpose to move from 24x1 to 1x24
x=x';
  6 Comments
Rik
Rik on 29 Apr 2019
Thank you for clarifying. Statistics aren't my strong suit, so questions like this reach my limit.
alex brown
alex brown on 29 Apr 2019
Thank you Rik for the answer. Now I am looking for the distribution of the 24 answers generated by the randfixedsum. I think its distribution is not uniform. And it is a problem.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!