My goal is to compute the 95% confidence interval of a random variable, say X, distributed according to the normal distribution with mean 2 and variance 9/100. The code I have so far is as follows
BEGINNING OF CODE
conventionalCI = zeros(2,1000);
%reset values for next round, i+1 rr = zeros(1,200);
%Here we are drawing directly from the distrution N(2,9/100), which is the %distribution of the sameple means of a random variable from N(2,9).
rr = 2 + (3/10).*normrnd(0,1,1,200);
%The quantiles for the data above is recorded
conventionalCI(1,i) = quantile(sort(rr),.975); conventionalCI(2,i) = quantile(sort(rr),.025);
%Proportion (for conventional confidence intervals)
propCI = zeros(1,1000); for i=1:1000 if (2>=conventionalCI(2,i)) & (conventionalCI(1,i)>=2) propCI(i)=1; else propCI(i)=0; end
proportionCI = sum(propCI)/1000
END OF CODE
In essence what I'm doing is drawing from the distribution N(2,9/100) with a sample of size 200, saving the 97.5 and 2.5 percentiles in the variable conventionalCI and calculating the proportion of times the true mean, 2, falls in the intervals in conventionalCI.
The problem is I keep getting the 1 for the proportion of times 2 falls in the interval which doesn't make sense given that I'm trying to construct a 95% confidence interval. Can anyone figure out why this is the case?
I appreciate your help
No products are associated with this question.
Suppose your friend gives you a sample (drawn from a parent distribution that you do not know). Maybe that sample is
>> x = [1 1.5 1.8 1.9 1.9 1.9 1.9 2 2 2 2 2 2.1 2.1 2.1 2.1 2.2 2.5 3];
It happens to have a sample mean of exactly 2. (If I were not lazy, I could have given you lots more data, and told you the confidence interval as well.) You do not know what the parent mean is. The parent distribution might have a mean of 2.04, or 1.93. The most likely parent mean is 2, and parent means further and further from 2 are statistically less likely. Under normal distribution assumptions, you can calculate how likely each value of the parent mean is.
Now -- and this is the point that is specifically important for your code -- you cannot test that CI by drawing over and over again from a parent distribution with mean 2. That would just be assuming the exact thing you are attempting to test! You will almost always ( way more than 95%) have a sample mean within your CI.
Instead, you would need to test by drawing from the entire range of possible parent distributions, in proportion to their probability of having generated the sample data that your friend gave you.
This is a very subtle but important distinction. I hope I've helped your understanding a little.
OK, so I finally found some time to sit down and write out some code that illustrates what you can do. I tried to comment it richly and use explicit variable names to help you understand what's going on. The core of what you tried to do is still there, which I hope will guide you.
function proportionCI = howOftenConfidenceIntervalContainsMean()
% Set the random number seed. [Older MATLAB version don't have the rng() function. Modify accordingly.] rng(1)
SAMPLESIZE = 200;
% Create a "random" sample. For the rest of the code, you have to pretend that you never % knew the true parameters of the parent distribution. rr = 2 + 3*normrnd(0,1,1,SAMPLESIZE);
sampleMean = mean(rr); sampleStandardDeviation = std(rr); sampleStandardErrorOfMean = sampleStandardDeviation/sqrt(SAMPLESIZE);
confidenceIntervalOfMean = sampleMean + 1.96*sampleStandardErrorOfMean*[1; -1];
% Now, here's the slightly tricky part. You want the distribution of the possible parent means % that might have resulted in your sample. But you don't know what those are. (You actually % can derive it theoretically, but we are going to get it here by simulation.)
% The trick that enable us to do this by sampling is that we have an APPROXIMATION to the % parent distribution ... our one sample! So, because we can't sample the parent, instead % we SAMPLE THE SAMPLE! (This trick is known as resampling.) We will resample many times, % and each one of those resamples gives us an approximation to the parent mean. We check those % approximate parent means to see if they lie within the confidence interval as often as we % expect.
NUMBER_RESAMPLE_TRIALS = 100000;
% Do the resampling (lots of times), and see what proportion of the (approximate) parent means % lie within the confidence intervals propCI = zeros(1,NUMBER_RESAMPLE_TRIALS);
if mod(ni,1000)==0 disp(['Trial ',num2str(ni),' of ',num2str(NUMBER_RESAMPLE_TRIALS)]) end
% For each trial we generate a (re)sample. resampledSample = rr(randsample(SAMPLESIZE,SAMPLESIZE,true));
resampledMean = mean(resampledSample); % Each trials gives one approximation of the parent mean
% Test to see if the parent mean lies within the confidence interval if (resampledMean>=confidenceIntervalOfMean(2)) && (confidenceIntervalOfMean(1)>=resampledMean) propCI(ni)=1; else propCI(ni)=0; end end
proportionCI = sum(propCI)/NUMBER_RESAMPLE_TRIALS
Try this modified version, :
%KHMOU Youssef, Statistics :exmpl1 30/01/2013 5:00 AM
conventionalCI = zeros(2,N);
%sample=sort(sample); %p=length(sample); %n=p/10;
if (2.00000>=b) && (a>=2.00000)
proportionCI = sum(propCI)/N
After each iteration i, the RANDOM sample increases in length, Does it give the result you want ? anyway try to change the number of samples per example :
As the the number of samples increases, th mean tends to 1.
i Hope that helps :