|
"Ulrik Nash" <uwn@sam.sdu.dk> wrote in message <j43ob6$7dj$1@newscl01ah.mathworks.com>...
> Hi Everyone,
>
> I am working on a problem where I require a sample of numbers drawn from a distribution that has no bounds. The issue is that some values that may be drawn from these distributions do not make any sense for the task I am working on.
>
> I require a specific number of data points drawn, so I can't just delete values that lie outside the 'threshold of realism'.
>
> Also, any number lying outside the 'threshold of realism' I cannot just set to threshold values, because that would not be realistic either.
>
> So, what I wish to achieve is (1) remove outliers (2) to 're-sample' from the distribution once again, with sample size equal to the number of outliers (3) add these new data points to the 'main sample' and repeat until there are no outliers, at which point I have the required sample.
>
> I have made an attempt at a function. It is extremely inefficient, I am sure, and not only because it is incomplete:
>
> function [augmentedSample] = augmentedSample(min_threshold,max_threshold,required_number_in_sample)
>
> % for example: min_threshold = 1;
> % for example: max_threshold = 10;
> % for example: required_number_in_sample = 10;
>
> sample = randn(1,required_number_in_sample)*20; %This is just an example. The general problem concerns distributions without bounds.
> numbers_greater = sum(sample >= min_threshold);
> A = sort(sample,'descend');
> B = A(1:numbers_greater);
> numbers_smaller = sum(B<= max_threshold);
> C = sort(B,'ascend');
> D = C(1:numbers_smaller);
> number_additions = required_number_in_sample - numel(D);
> new_additions = randn(1,number_additions)*20;
>
> % now I can update sample and start again ....
> sample = [D new_additions];
>
> % and continue until ....
> % .... number_additions = 0, at which point ....
>
> augmentedSample = sample;
>
> end
>
>
> I would appreciate any suggestions on how to achieve the aim I have described.
>
> Best regards,
>
> Ulrik.
This probably very inefficient function, seams to do the trick:
function [augmentedCauchySample] = augmentedCauchySample(t,s,min_threshold,max_threshold,size)
% This function deals with the problem of sampling from the Cauchy
% Distribution, which is unbounded. Its unboundedness creates a positive
% probability that the sample drawn includes points that are not realistic
% for the purpose. Instead of deleting unrealistic values or setting
% unrealistic values to thresholds, this function essentially resamples
% outliers until no outliers remain.
% size : The required number of data points in the sample.
% max_threshold : The lowest allowed value in sample.
% min_threshold : The lowest allowed value in sample.
% s > 0 : The scale parameter of the Cauchy Distribution. Larger s makes the distribution LESS peaked.
% t : The scale location of the Cauchy Distribution. This is the middlemost point in the sample.
number_additions = 1;
sample = randraw('cauchy',[t, s],size);
while number_additions > 0;
numbers_greater = sum(sample >= min_threshold);
A = sort(sample,'descend');
B = A(1:numbers_greater);
numbers_smaller = sum(B <= max_threshold);
C = sort(B,'ascend');
D = C(1:numbers_smaller);
number_additions = size - numel(D);
new_additions = randraw('cauchy',[t, s],number_additions);
sample = [D new_additions];
end
augmentedCauchySample = sample;
end
|