how can I find the CI for non normal data

Dear all
I need to find the 1-tailed 90% confidence interval (actually the lower limit of 90% confidence interval) of my attached non-normal data. (I tested the normality using adtest and lillietest.)
First, I transferred the data into normal using boxcox, then computed the CI, and reverse-transferred the LCI limit. But the limit value wLB looks very high to me.
My knowledge of statistics is limited. Could you please check my code and give a feedback? Is this the correct way of doing it? Here is my code:
load('dData.mat');
[dDataB,lambda] = boxcox(dData'); % normalize
% compute confidence interval:
pdB = fitdist(dDataB, 'Normal');
ciB = paramci(pdB, 'Alpha', 0.10); % for 1-tailed 90%
lclB = ciB(1,1); % get the lower confidence limit
wLB = (1 + lclB * lambda)^(1 / lambda); % reverse-transfer
Thanks

3 Comments

Deciding which statistics to use is highly dependent upon the questions your asking and how you will use those results. I usually prefer as few levels of abstraction as possible so transforming the data, computing CIs, then counter-transforming those CIs seems quite abstract to me (again, depending on your questions and how you will use the results).
A common solution to computing confidence intervals on non-normally distributed data is by using the bootstrap method. If you're unfamiliar with that method, there's plenty of resources out there that can get you started in understanding how it works. Briefly, this method resamples your data with replacement many times and performs a statistic (say, the mean) with each iteration. So if you bootstrap 100 times you'll have 100 means. Due to the central limit theorem, the distribution of means will always approach a normal distribution and that can be used to get the CIs.
Fortunately matlab does all that for you with the bootci() function. Here's an example of performing 1000 bootstraps on your dData at 90% confidence. Due to using random numbers, your results may differ unless you set the seed as I've demonstrated with rng().
%rng(166) %if you want to reproduce these results
ci = bootci(1000, {@mean, dData}, 'type', 'per', 'alpha', .1)
ci =
0.34581
0.36217
NOTE: on 18-Nov,2021 I discovered an error in my code above and fixed it. The alpha value for 90% CI was incorrectly 0.90 and was corrected to 0.10. The ci values were also adjusted.
Just like in your calculations, the CI is tiny.
Lastly, always plot out your data to see if the results make sense. The plot below shows the distribution of your data and the vertical line shows the mean. Remember that this is the CI of the mean, not the standard deviation or standard error of your distribution.
figure
hold on
hist(dData,20) %plot data
axis tight
plot([mean(dData),mean(dData)], [min(ylim),max(ylim)],'m') %plot mean
errorbar(mean(dData), max(ylim)*.9, mean(dData)-ci(1), ci(2)-mean(dData), 'Horizontal', 'DisplayName', '90%CI') %plot CI
title('raw data')
legend('raw data', 'mean')
Dear Adam Danz, thank you for your kind response
Your solution perfectly gives the CI for the mean. But i want to compute something else which I probabily couldnt explain clearly.
Assume that my data is sorted ascending. What I want to find is the smallest data point which satisfies the 90% confidence level (i.e. 1-tailed).
Since the data is non-normal, an empirical solution like three-sigma rule would not apply.
Regards. TS
The script below sorts your dData and then finds the first value that exceeds the lower confidence interval (which is hard coded in the script).
% Assumes dData is loaded in workspace
CI = [0.35309 0.35446]; % CI of the mean
[dDataSort, sortIdx] = sort(dData); % dData sorted
%index of first sorted point greater than lower CI
idx = find(dDataSort > CI(1), 1);
% The first data point past the lower CI is...
firstAccepted = dDataSort(idx);
%... and it's location in dData is....
firstAcceptedIdx = sortIdx(idx);
% Plot results, circle first accepted
figure
subplot(2,1,1)
plot(dDataSort, 'b.'); %plot your sorted data
hold on
refline(0, CI(1)); %plot the lower CI
plot(idx, firstAccepted, 'ro') %circle the first point past within CI
title('dData sorted')
subplot(2,1,2)
plot(dDataSort, 'b.');
hold on
refline(0, CI(1));
plot(idx, firstAccepted, 'ro')
xlim([idx-10, idx+10])
ylim([firstAccepted-.01, firstAccepted+.01])
title('Zoomed in')
legend('dData', 'lower CI', 'First in CI', 'location', 'NorthWest')
The plot generated above (shown below) shows your raw dData sorted and the 2nd subplot shows the same data zoomed in so you can see that the first point after the lower CI is chosen.

Sign in to comment.

Answers (0)

Categories

Asked:

on 14 Jun 2018

Edited:

on 18 Nov 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!