Path: news.mathworks.com!not-for-mail
From: "Ulrik Nash" <uwn@sam.sdu.dk>
Newsgroups: comp.soft-sys.matlab
Subject: Augmenting a sample from unbounded distribution, until no values are above or below threshold.
Date: Tue, 6 Sep 2011 00:09:10 +0000 (UTC)
Organization: The MathWorks, Inc.
Lines: 44
Message-ID: <j43ob6$7dj$1@newscl01ah.mathworks.com>
Reply-To: "Ulrik Nash" <uwn@sam.sdu.dk>
NNTP-Posting-Host: www-03-blr.mathworks.com
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: newscl01ah.mathworks.com 1315267750 7603 172.30.248.48 (6 Sep 2011 00:09:10 GMT)
X-Complaints-To: news@mathworks.com
NNTP-Posting-Date: Tue, 6 Sep 2011 00:09:10 +0000 (UTC)
X-Newsreader: MATLAB Central Newsreader 2106050
Xref: news.mathworks.com comp.soft-sys.matlab:742263

Hi Everyone,

I am working on a problem where I require a sample of numbers drawn from a distribution that has no bounds. The issue is that some values that may be drawn from these distributions do not make any sense for the task I am working on.

I require a specific number of data points drawn, so I can't just delete values that lie outside the 'threshold of realism'.

Also, any number lying outside the 'threshold of realism' I cannot just set to threshold values, because that would not be realistic either.

So, what I wish to achieve is (1) remove outliers (2) to 're-sample' from the distribution once again, with sample size equal to the number of outliers (3) add these new data points to the 'main sample' and repeat until there are no outliers, at which point I have the required sample.

I have made an attempt at a function. It is extremely inefficient, I am sure, and not only because it is incomplete:

function [augmentedSample] = augmentedSample(min_threshold,max_threshold,required_number_in_sample)

% for example: min_threshold = 1;
% for example: max_threshold = 10;
% for example: required_number_in_sample = 10;

sample = randn(1,required_number_in_sample)*20; %This is just an example. The general problem concerns distributions without bounds.
numbers_greater = sum(sample >= min_threshold);
A = sort(sample,'descend');
B = A(1:numbers_greater);
numbers_smaller = sum(B<= max_threshold);
C = sort(B,'ascend');
D = C(1:numbers_smaller);
number_additions = required_number_in_sample - numel(D);
new_additions = randn(1,number_additions)*20;

% now I can update sample and start again ....
sample = [D new_additions];

% and continue until ....
% .... number_additions = 0, at which point ....

augmentedSample = sample;

end


I would appreciate any suggestions on how to achieve the aim I have described.

Best regards,

Ulrik.