Avoiding for loop with anova1

Question

0 votes

I have 2 large 3D matrices, and I've noticed that passing a DIM argument for the ttest2 function decreases the computational time on my computer substantially as opposed to looping through each index in the first 2 dimensions (near 60-fold for the data I'm working with):

[~,~,~,stats]=ttest2(A,B,[],[],[],3);

vs.

 stats=cell(rmax,cmax);
 for r=1:rmax
    for c=1:cmax
        [~,~,~,stats{r,c}]=ttest2(A(r,c,:),B(r,c,:));
    end
 end

Now I would like to do the same thing, but adding another matrix C and using anova1 instead of ttest2. The documentation for anova1 suggests that it won't accept a similar DIM argument, but I'm wondering if there's a similar way to cut down on time and avoid looping through each element.

Thanks!

Zach

4 Comments
Show 2 older comments Hide 2 older comments

Star Strider on 5 Apr 2015

I still have no idea what you’re doing, but if you’re doing clustering, you might want to look through the Statistics Toolbox for cluster algorithms such as clusterdata. There are several others, and several related functions, such as those listed at the end of that webpage.

Zach on 6 Apr 2015

Simply put, I am not using the ttest or the anova1 to test for significance -- I do so later on in my analysis and correct for multiple comparisons then. That issue aside, I still have several billion ANOVAs to run, and I want to cut out as much time as possible. Something akin to the DIM argument of ttest seems to be a good candidate for this, but I can't find anything like it for anova1, so I want to know if there's somewhere else I should be looking or another approach I could try.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Zach on 14 Apr 2015

Open in MATLAB Online

0 votes

I tooled around a bit with the anova1 function and reacquainted myself with the math. The below function appears to return the same values as looping through anova1, but a fair bit faster (I didn't time it), as I originally wanted. It could probably be improved substantially by vectorizing, etc., but it gets the job done. As Star Strider pointed out, it should never be used for significance testing unless coupled with a correction for multiple comparisons.

GDIM is the dimension along which groups are defined (columns in anova1) and MDIM is the dimension along which within groups measures are defined (rows in anova1). All F-statistics are calculated independently.

function [F,DFR,DFE]=NDfstat(X,GDIM,MDIM)
if exist('GDIM','var') && exist('MDIM','var')
   D=size(X);
   X=permute(X,[GDIM MDIM setdiff(1:numel(D),[GDIM MDIM])]);
end
D=size(X);
DFE = D(1)-1;
DFR = D(1)*(D(2)-1);
% X = X-repmat(mean(X,2),[1 D(2) ones(1,numel(D)-2)]);
   % anova1 does this, supposedly to improve accuracy, but it
   % seems to destroy calculation. Not sure what the deal is.
xm = squeeze(mean(X,2));
gm = squeeze(mean(xm));
SSE = squeeze(D(2)*sum((xm-repmat(shiftdim(gm,-1),[D(1) ones(1,numel(D)-1)])).^2));
SST = squeeze(sum(sum((X-repmat(shiftdim(gm,-2),[D(1:2) ones(1,numel(D)-1)])).^2)));
SSR = SST-SSE;
F = (SSE/DFE)./(SSR/DFR);
end

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Avoiding for loop with anova1

4 Comments
Show 2 older comments Hide 2 older comments

Accepted Answer

0 Comments
Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Tags

Community Treasure Hunt

Avoiding for loop with anova1

4 Comments Show 2 older comments Hide 2 older comments

Accepted Answer

0 Comments Show -2 older comments Hide -2 older comments

More Answers (0)

Categories

Tags

See Also

Community Treasure Hunt

4 Comments
Show 2 older comments Hide 2 older comments

0 Comments
Show -2 older comments Hide -2 older comments