function boothomvart(x,s,alpha)
%BOOTHOMVART Bootstrap Homogeinity of Variances Test T Analytical Approach.
% The bootstrap is a way of estimating the variability of a statistic
% from a single data set by resampling it independently and with equal
% probabilities (Monte Carlo resampling). Allows the estimation of
% measures where the underlying distribution is unknown or where sample
% sizes are small. Their results are consistent with the statistical
% properties of those analytical methods (Efron and Tibshirani, 1993).
%
% The name 'bootstrap' originates from the expression 'pulling yourself
% up by your own bootstraps' and refers to the basic idea of the
% bootstrap, sampling with replacement from the data. In this way a
% large number of 'bootstrap samples' is generated, each of the same size
% as the original data set. From each bootstrap sample the statistical
% parameter of interest is calculated (Wehrens and Van der Linden, 1997).
%
% Here, we use the Non-parametric Bootstrap. Non-parametric bootstrap is
% simpler. It does not use the structure of the model to construct
% artificial data. The data is instead directly resampled with
% replecement.
%
% The homogeneity of variances test is a useful tool in many scientific
% applications. Boos and Brownie (2004) and Conover et al. (1981) give a
% broad review.
%
% Cahoy (2010) proposed a variance-based statistic that led to a bootstrap
% test for heterogeneity of variances, for any distribution and with a
% slight modification of the Alam and Cahoy's test (1999). This procedure,
% who used a generalized box-type acceptance region is shown to be more
% sensitive to slight deviations from the null specifications. Cahoy
% (2010) remarks that the properties of the test may change when there are
% more than four populations involved, and these populations are not from
% a location-scale family and may have different kurtosis. Meaning that
% experimenters should exercise caution when this method is used in
% practice. Within the boundaries of the study, he generally recommend the
% test T under most conditions.
%
% As Cahoy (2010) did, here a m-file analytical procedure using
% bootstrap method is developed as an alternative to the homogeinity of
% variances test.
%
% BOOTHOMVART treats NaN values as missing values, and removes them.
%
% Syntax: function boothomvarT(x,s,alpha)
%
% Inputs:
% x data nx2 matrix (Col 1 = data; Col 2 = sample code)
% s - boot times or number of Bootstrap simulations (resamplings)
% alpha - significance level (default=0.05)
%
% Outputs:
% - Summary statistics from the samples
% - Decision on the null-hypothesis tested
%
% --We would appreciate any suggestions to improve this m-code in order
% to reduce the elapsed time. The below example, with 3000 resamplings
% took near 38 seconds.--
%
% Taking the numerical example given by Dr. Burt B. Gerstman, Department
% of Health Science, San Jose State University, in his internet site
% [http://www.sjsu.edu/faculty/gerstman/StatPrimer/anova-b.pdf], from a
% study on skin pigmentation from four families from the same 'racial
% group'. The dependent variable is a measure of skin pigmentation.
% Data are:
%
% ---------------------------------------
% Family
% ---------------------------------------
% 1 2 3 4
% ---------------------------------------
% 36 46 40 45
% 39 47 50 53
% 43 47 44 56
% 38 47 48 52
% 37 43 50 56
% ---------------------------------------
%
% We wish to test whether the four variables have the same variances after
% 3000 re-samplings and with a significance of 0.05.
%
% Input data:
%
% X = [36 1;39 1;43 1;38 1;37 1;46 2;47 2;47 2;47 2;43 2;40 3;50 3;
% 44 3;48 3;50 3;45 4;53 4;56 4;52 4;56 4];
%
% Calling on Matlab the function:
% boothomvart(X,3000,0.05)
%
% Answer is:
%
% Summary statistics from the samples.
% --------------------------------------------------
% Sample Size Mean Variance
% --------------------------------------------------
% 1 5 38.6000 7.3000
% 2 5 46.0000 3.0000
% 3 5 46.4000 18.8000
% 4 5 52.4000 20.3000
% --------------------------------------------------
%
% Critical value = 2.8233;
% No. of test statistic values at least equal to critical value = 0
%
% After 3000 resamplings and with a significance of 0.05
% We accept the null hypothesis that the variances are homogeneous.
%
% Created by A. Trujillo-Ortiz and R. Hernandez-Walls
% Facultad de Ciencias Marinas
% Universidad Autonoma de Baja California
% Apdo. Postal 453
% Ensenada, Baja California
% Mexico.
% atrujo@uabc.edu.mx
%
% Copyright (C) August 15, 2011.
%
% ---We thank Dr. Dexter O. Cahoy (Program of Mathematics and Statistics,
% College of Engineering and Science, Louisiana Tech University, Ruston, LA)
% for provided us the paper's hard copy to be possible this work.---
%
% To cite this file, this would be an appropriate format:
% Trujillo-Ortiz, A. and R. Hernandez-Walls. (2011). boothomvart:
% Bootstrap Homogeinity of Variance Test T Analytical Approach.
% [WWW document]. URL http://www.mathworks.com/matlabcentral/fileexchange/
% 32646-boothomvart
%
% References:
% Alam, K. and Cahoy, D. O. (1999), A test for equality of variences.
% Journal of Mathematical Sciences, Philippines, 2(1):1-19.
% Boos, D. and Brownie, C. (2004), Comparing variances and other measures
% of dispersion. Stat. Sci., 19(4):571-578.
% Conover, M. E., Johnson, M. E. and Johnson, M. M. (1981), A comparative
% study of variances with applications to the outer continental
% Cahoy, D. O. (2010), A bootstrap test for equality of variances. Comp.
% Stat. and Data Analysis. 54(10):2306-2316.
% shelf bidding data. Technometrics, 23:351-361.
% Efron, B. and Tibshirani, R. J. (1993), An Introduction to the Bootstrap
% Chapman and Hall:New York.
% Wehrens, R and Van der Linden, W. E. (1997), Bootstrapping Principal
% Component Regression Models. Journal of Chemometrics,
% 11:157171.
%
if nargin < 2,
error('boothomvart:TooFewInputs', ...
'BOOTHOMVART requires at least three input arguments.');
end
if nargin < 3 || isempty(alpha)
alpha = 0.05; %default
elseif numel(alpha) ~= 1 || alpha <= 0 || alpha >= 1
error('boothomvart:BadAlpha','ALPHA must be a scalar between 0 and 1.');
end
X = x;
c = size(X,2);
if c ~= 2
error('stats:boothomvart:BadData','X must have two colums.');
end
%Remove NaN values, if any
X = X(~any(isnan(X),2),:);
k = max(X(:,2));
indice = X(:,2);
for i = 1:k
Xe = indice == i;
d(i).X = X(Xe,1);
d(i).m = mean(d(i).X);
d(i).vo = var(d(i).X);
d(i).e = d(i).X - d(i).m;
d(i).v = var(d(i).e);
d(i).n = length(d(i).e);
end
m=cat(1,d.m);vo=cat(1,d.vo);e=cat(1,d.e);v=cat(1,d.v);n=cat(1,d.n);
disp(' ')
disp('Summary statistics from the samples.')
disp('--------------------------------------------------')
disp(' Sample Size Mean Variance ')
disp('--------------------------------------------------')
for i = 1:k
fprintf(' %d %i %7.4f %7.4f\n',i,n(i),m(i),vo(i))
end
disp('--------------------------------------------------')
disp(' ')
k2 = kurtosis(e);
for i = 1:k
d(i).vlnc = ((1/(d(i).n - 1)))*(k2-(d(i).n - 3)/d(i).n);
end
varlns2c=cat(1,d.vlnc);
for i = 1:k
d(i).nuc = (log(d(i).v/prod(v)^(1/(k))));
d(i).lamc = sqrt((1-2/k)*d(i).vlnc + (1/(k^2)*sum(varlns2c)));
d(i).tc = d(i).nuc/d(i).lamc;
end
nuc=cat(1,d.nuc);lamc=cat(1,d.lamc);tc=cat(1,d.tc);
warning off
for i = 1:s
for j = 1:k
Xe = indice == j;
d(j).X = X(Xe,1);
d(j).n = length(d(j).X);
d(j).id = ceil(rand(d(j).n,s)*d(j).n);
d(j).bd = d(j).X(d(j).id);
end
bd=cat(1,d.bd);id=cat(1,d.id);
end
xx = X(:,2);
NU=[];LAM=[];T=[];
for i = 1:s
X = [bd(:,i) xx];
indice = X(:,2);
for j = 1:k
Xe = indice == j;
d(j).X = X(Xe,1);
d(j).e = d(j).X - mean(d(j).X);
d(j).v = var(d(j).e);
d(j).n = length(d(j).e);
end
e=cat(1,d.e);v=cat(1,d.v);n=cat(1,d.n);
k2 = kurtosis(e);
for j = 1:k
d(j).vln = (1/(d(j).n - 1))*(k2-(d(j).n - 3)/d(j).n);
end
varlns2=cat(1,d.vln);
for j = 1:k
d(j).nu = log(d(j).v/prod(v)^(1/(k)));
d(j).lam = sqrt((1-2/k)*d(j).vln + (1/(k^2)*sum(varlns2)));
d(j).t = d(j).nu/d(j).lam;
end
nu=cat(1,d.nu);lam=cat(1,d.lam);t=cat(1,d.t);
NU = [NU;nu];
LAM = [LAM;lam];
T = [T;t];
end
nans = isnan(T); % finds the elements of T which are NaNs
T(nans) = 0; % and/or Infs and set them to 0
infs = isinf(T);
T(infs) = 0;
T = reshape(T,k,s);
T(:,T(1,:) == 0) = [];
T = T - repmat(mean(T,2),1,size(T,2));
if k == 2
S = sort(abs(T(1,:)),'descend');
cv = S(ceil(s*alpha)+1);
else
S = sort(abs(T(:)),'descend');
LS = S;LI = -S;
st = length(T);
r = round(k*.3*st);
D = [];
for i = 1:r
d(i).li = repmat(LI(i),st,k);
d(i).ls = repmat(LS(i),st,k);
d(i).c = (T' <= d(i).ls) & (T' >= d(i).li);
d(i).d = sum(d(i).c);
end
d=cat(1,d.d);
D = [D;d];
for v = 1:size(D,1)
if any(D(v,:) == (floor(st*(1-alpha))))
break
end
end
cv = S(v);
end
ct = length(find(abs(tc) >= cv));
fprintf('Critical value = %3.4f\n',cv);
fprintf('No. of test statistic values at least equal to critical value = %g\n',ct);
disp(' ')
if any(abs(tc) >= cv);
fprintf('After %g resamplings and with a significance of %3.2f\n',s,alpha);
fprintf('We reject the null hypothesis that the variances are homogeneous.\n');
else
fprintf('After %g resamplings and with a significance of %3.2f\n',s,alpha);
fprintf('We accept the null hypothesis that the variances are homogeneous.\n');
end
warning on
return,