Avoiding for loop with anova1

3 views (last 30 days)
Zach
Zach on 5 Apr 2015
Answered: Zach on 14 Apr 2015
I have 2 large 3D matrices, and I've noticed that passing a DIM argument for the ttest2 function decreases the computational time on my computer substantially as opposed to looping through each index in the first 2 dimensions (near 60-fold for the data I'm working with):
[~,~,~,stats]=ttest2(A,B,[],[],[],3);
vs.
stats=cell(rmax,cmax);
for r=1:rmax
for c=1:cmax
[~,~,~,stats{r,c}]=ttest2(A(r,c,:),B(r,c,:));
end
end
Now I would like to do the same thing, but adding another matrix C and using anova1 instead of ttest2. The documentation for anova1 suggests that it won't accept a similar DIM argument, but I'm wondering if there's a similar way to cut down on time and avoid looping through each element.
Thanks!
Zach
  4 Comments
Star Strider
Star Strider on 5 Apr 2015
I still have no idea what you’re doing, but if you’re doing clustering, you might want to look through the Statistics Toolbox for cluster algorithms such as clusterdata. There are several others, and several related functions, such as those listed at the end of that webpage.
Zach
Zach on 6 Apr 2015
Simply put, I am not using the ttest or the anova1 to test for significance -- I do so later on in my analysis and correct for multiple comparisons then. That issue aside, I still have several billion ANOVAs to run, and I want to cut out as much time as possible. Something akin to the DIM argument of ttest seems to be a good candidate for this, but I can't find anything like it for anova1, so I want to know if there's somewhere else I should be looking or another approach I could try.

Sign in to comment.

Accepted Answer

Zach
Zach on 14 Apr 2015
I tooled around a bit with the anova1 function and reacquainted myself with the math. The below function appears to return the same values as looping through anova1, but a fair bit faster (I didn't time it), as I originally wanted. It could probably be improved substantially by vectorizing, etc., but it gets the job done. As Star Strider pointed out, it should never be used for significance testing unless coupled with a correction for multiple comparisons.
GDIM is the dimension along which groups are defined (columns in anova1) and MDIM is the dimension along which within groups measures are defined (rows in anova1). All F-statistics are calculated independently.
function [F,DFR,DFE]=NDfstat(X,GDIM,MDIM)
if exist('GDIM','var') && exist('MDIM','var')
D=size(X);
X=permute(X,[GDIM MDIM setdiff(1:numel(D),[GDIM MDIM])]);
end
D=size(X);
DFE = D(1)-1;
DFR = D(1)*(D(2)-1);
% X = X-repmat(mean(X,2),[1 D(2) ones(1,numel(D)-2)]);
% anova1 does this, supposedly to improve accuracy, but it
% seems to destroy calculation. Not sure what the deal is.
xm = squeeze(mean(X,2));
gm = squeeze(mean(xm));
SSE = squeeze(D(2)*sum((xm-repmat(shiftdim(gm,-1),[D(1) ones(1,numel(D)-1)])).^2));
SST = squeeze(sum(sum((X-repmat(shiftdim(gm,-2),[D(1:2) ones(1,numel(D)-1)])).^2)));
SSR = SST-SSE;
F = (SSE/DFE)./(SSR/DFR);
end

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!