Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Augmenting a sample from unbounded distribution, until no values are above or below threshold.

Subject: Augmenting a sample from unbounded distribution, until no values are above or below threshold.

From: Ulrik Nash

Date: 6 Sep, 2011 00:09:10

Message: 1 of 3

Hi Everyone,

I am working on a problem where I require a sample of numbers drawn from a distribution that has no bounds. The issue is that some values that may be drawn from these distributions do not make any sense for the task I am working on.

I require a specific number of data points drawn, so I can't just delete values that lie outside the 'threshold of realism'.

Also, any number lying outside the 'threshold of realism' I cannot just set to threshold values, because that would not be realistic either.

So, what I wish to achieve is (1) remove outliers (2) to 're-sample' from the distribution once again, with sample size equal to the number of outliers (3) add these new data points to the 'main sample' and repeat until there are no outliers, at which point I have the required sample.

I have made an attempt at a function. It is extremely inefficient, I am sure, and not only because it is incomplete:

function [augmentedSample] = augmentedSample(min_threshold,max_threshold,required_number_in_sample)

% for example: min_threshold = 1;
% for example: max_threshold = 10;
% for example: required_number_in_sample = 10;

sample = randn(1,required_number_in_sample)*20; %This is just an example. The general problem concerns distributions without bounds.
numbers_greater = sum(sample >= min_threshold);
A = sort(sample,'descend');
B = A(1:numbers_greater);
numbers_smaller = sum(B<= max_threshold);
C = sort(B,'ascend');
D = C(1:numbers_smaller);
number_additions = required_number_in_sample - numel(D);
new_additions = randn(1,number_additions)*20;

% now I can update sample and start again ....
sample = [D new_additions];

% and continue until ....
% .... number_additions = 0, at which point ....

augmentedSample = sample;

end


I would appreciate any suggestions on how to achieve the aim I have described.

Best regards,

Ulrik.

Subject: Augmenting a sample from unbounded distribution, until no values are above or below threshold.

From: Ulrik Nash

Date: 6 Sep, 2011 11:51:28

Message: 2 of 3

"Ulrik Nash" <uwn@sam.sdu.dk> wrote in message <j43ob6$7dj$1@newscl01ah.mathworks.com>...
> Hi Everyone,
>
> I am working on a problem where I require a sample of numbers drawn from a distribution that has no bounds. The issue is that some values that may be drawn from these distributions do not make any sense for the task I am working on.
>
> I require a specific number of data points drawn, so I can't just delete values that lie outside the 'threshold of realism'.
>
> Also, any number lying outside the 'threshold of realism' I cannot just set to threshold values, because that would not be realistic either.
>
> So, what I wish to achieve is (1) remove outliers (2) to 're-sample' from the distribution once again, with sample size equal to the number of outliers (3) add these new data points to the 'main sample' and repeat until there are no outliers, at which point I have the required sample.
>
> I have made an attempt at a function. It is extremely inefficient, I am sure, and not only because it is incomplete:
>
> function [augmentedSample] = augmentedSample(min_threshold,max_threshold,required_number_in_sample)
>
> % for example: min_threshold = 1;
> % for example: max_threshold = 10;
> % for example: required_number_in_sample = 10;
>
> sample = randn(1,required_number_in_sample)*20; %This is just an example. The general problem concerns distributions without bounds.
> numbers_greater = sum(sample >= min_threshold);
> A = sort(sample,'descend');
> B = A(1:numbers_greater);
> numbers_smaller = sum(B<= max_threshold);
> C = sort(B,'ascend');
> D = C(1:numbers_smaller);
> number_additions = required_number_in_sample - numel(D);
> new_additions = randn(1,number_additions)*20;
>
> % now I can update sample and start again ....
> sample = [D new_additions];
>
> % and continue until ....
> % .... number_additions = 0, at which point ....
>
> augmentedSample = sample;
>
> end
>
>
> I would appreciate any suggestions on how to achieve the aim I have described.
>
> Best regards,
>
> Ulrik.

This probably very inefficient function, seams to do the trick:

function [augmentedCauchySample] = augmentedCauchySample(t,s,min_threshold,max_threshold,size)

% This function deals with the problem of sampling from the Cauchy
% Distribution, which is unbounded. Its unboundedness creates a positive
% probability that the sample drawn includes points that are not realistic
% for the purpose. Instead of deleting unrealistic values or setting
% unrealistic values to thresholds, this function essentially resamples
% outliers until no outliers remain.

% size : The required number of data points in the sample.
% max_threshold : The lowest allowed value in sample.
% min_threshold : The lowest allowed value in sample.
% s > 0 : The scale parameter of the Cauchy Distribution. Larger s makes the distribution LESS peaked.
% t : The scale location of the Cauchy Distribution. This is the middlemost point in the sample.

number_additions = 1;
sample = randraw('cauchy',[t, s],size);
while number_additions > 0;
numbers_greater = sum(sample >= min_threshold);
A = sort(sample,'descend');
B = A(1:numbers_greater);
numbers_smaller = sum(B <= max_threshold);
C = sort(B,'ascend');
D = C(1:numbers_smaller);
number_additions = size - numel(D);
new_additions = randraw('cauchy',[t, s],number_additions);
sample = [D new_additions];
end
augmentedCauchySample = sample;

end

Subject: Augmenting a sample from unbounded distribution, until no values

From: divergent.tseries@gmail.com

Date: 2 Aug, 2012 14:29:02

Message: 3 of 3

Can you give a little more about the problem. I work regularly with data that naturally comes from a Cauchy distribution and real world data do look outside the threshold of reason. You need to be careful with some real world modelling that your "threshold of reason" is nature's threshold of reason. Processes like cancer or the stock market have quite unreasonable values as sets become large.

However, if you do think you have a real stochastic boundary condition, and you don't mind a little bit of skew a solution is to do the following

Draw x from a Cauchy distribution subject to x<=Y where y is drawn from a Cauchy distribution whose center is to the right of the center for x.

It creates a distribution of x times (1-CDF(y)). This allows the data to increase without bounds, but does create a right side stochastic budget constraint. That constraint is your boundary of reason, but allows nature to be "unreasonable."

Its important to remember that just because you cannot divide by zero, this does not prevent nature from doing so.

Sorry I didn't post matlab code, but it is rusty.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us