http://www.mathworks.com/matlabcentral/newsreader/view_thread/312244
MATLAB Central Newsreader  Augmenting a sample from unbounded distribution, until no values are above or below threshold.
Feed for thread: Augmenting a sample from unbounded distribution, until no values are above or below threshold.
enus
©19942015 by MathWorks, Inc.
webmaster@mathworks.com
MATLAB Central Newsreader
http://blogs.law.harvard.edu/tech/rss
60
MathWorks
http://www.mathworks.com/images/membrane_icon.gif

Tue, 06 Sep 2011 00:09:10 +0000
Augmenting a sample from unbounded distribution, until no values are above or below threshold.
http://www.mathworks.com/matlabcentral/newsreader/view_thread/312244#851406
Ulrik Nash
Hi Everyone,<br>
<br>
I am working on a problem where I require a sample of numbers drawn from a distribution that has no bounds. The issue is that some values that may be drawn from these distributions do not make any sense for the task I am working on.<br>
<br>
I require a specific number of data points drawn, so I can't just delete values that lie outside the 'threshold of realism'.<br>
<br>
Also, any number lying outside the 'threshold of realism' I cannot just set to threshold values, because that would not be realistic either.<br>
<br>
So, what I wish to achieve is (1) remove outliers (2) to 'resample' from the distribution once again, with sample size equal to the number of outliers (3) add these new data points to the 'main sample' and repeat until there are no outliers, at which point I have the required sample.<br>
<br>
I have made an attempt at a function. It is extremely inefficient, I am sure, and not only because it is incomplete:<br>
<br>
function [augmentedSample] = augmentedSample(min_threshold,max_threshold,required_number_in_sample)<br>
<br>
% for example: min_threshold = 1;<br>
% for example: max_threshold = 10;<br>
% for example: required_number_in_sample = 10;<br>
<br>
sample = randn(1,required_number_in_sample)*20; %This is just an example. The general problem concerns distributions without bounds.<br>
numbers_greater = sum(sample >= min_threshold);<br>
A = sort(sample,'descend');<br>
B = A(1:numbers_greater);<br>
numbers_smaller = sum(B<= max_threshold);<br>
C = sort(B,'ascend');<br>
D = C(1:numbers_smaller);<br>
number_additions = required_number_in_sample  numel(D);<br>
new_additions = randn(1,number_additions)*20;<br>
<br>
% now I can update sample and start again ....<br>
sample = [D new_additions];<br>
<br>
% and continue until ....<br>
% .... number_additions = 0, at which point ....<br>
<br>
augmentedSample = sample;<br>
<br>
end<br>
<br>
<br>
I would appreciate any suggestions on how to achieve the aim I have described.<br>
<br>
Best regards,<br>
<br>
Ulrik.

Tue, 06 Sep 2011 11:51:28 +0000
Re: Augmenting a sample from unbounded distribution, until no values are above or below threshold.
http://www.mathworks.com/matlabcentral/newsreader/view_thread/312244#851453
Ulrik Nash
"Ulrik Nash" <uwn@sam.sdu.dk> wrote in message <j43ob6$7dj$1@newscl01ah.mathworks.com>...<br>
> Hi Everyone,<br>
> <br>
> I am working on a problem where I require a sample of numbers drawn from a distribution that has no bounds. The issue is that some values that may be drawn from these distributions do not make any sense for the task I am working on.<br>
> <br>
> I require a specific number of data points drawn, so I can't just delete values that lie outside the 'threshold of realism'.<br>
> <br>
> Also, any number lying outside the 'threshold of realism' I cannot just set to threshold values, because that would not be realistic either.<br>
> <br>
> So, what I wish to achieve is (1) remove outliers (2) to 'resample' from the distribution once again, with sample size equal to the number of outliers (3) add these new data points to the 'main sample' and repeat until there are no outliers, at which point I have the required sample.<br>
> <br>
> I have made an attempt at a function. It is extremely inefficient, I am sure, and not only because it is incomplete:<br>
> <br>
> function [augmentedSample] = augmentedSample(min_threshold,max_threshold,required_number_in_sample)<br>
> <br>
> % for example: min_threshold = 1;<br>
> % for example: max_threshold = 10;<br>
> % for example: required_number_in_sample = 10;<br>
> <br>
> sample = randn(1,required_number_in_sample)*20; %This is just an example. The general problem concerns distributions without bounds.<br>
> numbers_greater = sum(sample >= min_threshold);<br>
> A = sort(sample,'descend');<br>
> B = A(1:numbers_greater);<br>
> numbers_smaller = sum(B<= max_threshold);<br>
> C = sort(B,'ascend');<br>
> D = C(1:numbers_smaller);<br>
> number_additions = required_number_in_sample  numel(D);<br>
> new_additions = randn(1,number_additions)*20;<br>
> <br>
> % now I can update sample and start again ....<br>
> sample = [D new_additions];<br>
> <br>
> % and continue until ....<br>
> % .... number_additions = 0, at which point ....<br>
> <br>
> augmentedSample = sample;<br>
> <br>
> end<br>
> <br>
> <br>
> I would appreciate any suggestions on how to achieve the aim I have described.<br>
> <br>
> Best regards,<br>
> <br>
> Ulrik.<br>
<br>
This probably very inefficient function, seams to do the trick:<br>
<br>
function [augmentedCauchySample] = augmentedCauchySample(t,s,min_threshold,max_threshold,size)<br>
<br>
% This function deals with the problem of sampling from the Cauchy<br>
% Distribution, which is unbounded. Its unboundedness creates a positive<br>
% probability that the sample drawn includes points that are not realistic<br>
% for the purpose. Instead of deleting unrealistic values or setting<br>
% unrealistic values to thresholds, this function essentially resamples<br>
% outliers until no outliers remain.<br>
<br>
% size : The required number of data points in the sample. <br>
% max_threshold : The lowest allowed value in sample. <br>
% min_threshold : The lowest allowed value in sample. <br>
% s > 0 : The scale parameter of the Cauchy Distribution. Larger s makes the distribution LESS peaked.<br>
% t : The scale location of the Cauchy Distribution. This is the middlemost point in the sample.<br>
<br>
number_additions = 1;<br>
sample = randraw('cauchy',[t, s],size);<br>
while number_additions > 0;<br>
numbers_greater = sum(sample >= min_threshold);<br>
A = sort(sample,'descend');<br>
B = A(1:numbers_greater);<br>
numbers_smaller = sum(B <= max_threshold);<br>
C = sort(B,'ascend');<br>
D = C(1:numbers_smaller);<br>
number_additions = size  numel(D);<br>
new_additions = randraw('cauchy',[t, s],number_additions);<br>
sample = [D new_additions];<br>
end<br>
augmentedCauchySample = sample;<br>
<br>
end

Thu, 02 Aug 2012 14:29:02 +0000
Re: Augmenting a sample from unbounded distribution, until no values
http://www.mathworks.com/matlabcentral/newsreader/view_thread/312244#884442
divergent.tseries@gmail.com
Can you give a little more about the problem. I work regularly with data that naturally comes from a Cauchy distribution and real world data do look outside the threshold of reason. You need to be careful with some real world modelling that your "threshold of reason" is nature's threshold of reason. Processes like cancer or the stock market have quite unreasonable values as sets become large.<br>
<br>
However, if you do think you have a real stochastic boundary condition, and you don't mind a little bit of skew a solution is to do the following<br>
<br>
Draw x from a Cauchy distribution subject to x<=Y where y is drawn from a Cauchy distribution whose center is to the right of the center for x.<br>
<br>
It creates a distribution of x times (1CDF(y)). This allows the data to increase without bounds, but does create a right side stochastic budget constraint. That constraint is your boundary of reason, but allows nature to be "unreasonable."<br>
<br>
Its important to remember that just because you cannot divide by zero, this does not prevent nature from doing so.<br>
<br>
Sorry I didn't post matlab code, but it is rusty.