Accumarray for two functions?

I'm trying to compute both the mean and SD of a set of values by group, and I can either do it with for-loop:
M = zeros(1,ngroup);
SD = zeros(1,ngroup);
for i = 1:ngroup
M(i) = mean(data(ind==i));
SD(i) = std(data(ind==i));
end
Or, alternatively use `accumarray` twice.
M = accumarray(ind,data,[],@mean);
SD = accumarray(ind,data,[],@std);
But is there a way to just use accumarray once and compute both quantities? Since accumarray is faster than for-loop, but calling it twice will be slow. Is it possible to do something like:
[M, SD] = accumarray(ind,data,[],{@mean,@std})

 Accepted Answer

Matt J
Matt J on 19 Oct 2022
Edited: Matt J on 19 Oct 2022
Since accumarray is faster than for-loop, but calling it twice will be slow.
I would argue that 3 calls to accumarray would be the most optimal.
data = rand(25e5,1);
ind=randi(100,size(data));
tic;
M0 = accumarray(ind,data,[],@mean);
SD0 = accumarray(ind,data,[],@std);
toc
Elapsed time is 0.499977 seconds.
tic;
Out = accumarray(ind, data, [], @(x){{mean(x) std(x)}});
toc;
Elapsed time is 0.235066 seconds.
tic;
N=accumarray(ind,1);
S = accumarray(ind,data);
S2=accumarray(ind,data.^2);
M=S./N;
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
toc
Elapsed time is 0.047581 seconds.

4 Comments

Wow that's interesting, many thanks!
Accumarray is optimized when doing sum.
Using function handle has great penalty.
I should mention though that there might be some sacrifice in numerical accuracy using the expansion,
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
The subtraction of S2 and 2*S.*M can give large floating point residuals.
data = 10000+rand(25e5,1);
ind=randi(100,size(data));
tic;
M0 = accumarray(ind,data,[],@mean);
SD0 = accumarray(ind,data,[],@std);
toc
Elapsed time is 0.574901 seconds.
tic;
N=accumarray(ind,1);
S = accumarray(ind,data);
S2=accumarray(ind,data.^2);
M=S./N;
SD = sqrt((S2 - 2*S.*M + N.*M.^2)./(N-1));
toc
Elapsed time is 0.048677 seconds.
errorMean=norm(M-M0)/norm(M0)
errorMean = 4.4567e-15
errorSTD=norm(SD-SD0)/norm(SD0)
errorSTD = 5.3453e-06
Small simplification
N=accumarray(ind,1);
S = accumarray(ind,data);
M = S./N;
S2=accumarray(ind,data.^2);
SD = sqrt((S2 - S.*M)./(N-1));

Sign in to comment.

More Answers (1)

The accumarray approach is certainly possible —
v = randn(250,1)
v = 250×1
0.0501 -0.2469 1.0253 1.1484 -0.9196 0.4947 -0.7221 1.2164 -0.3162 0.0461
Out = accumarray(ones(size(v)), v, [], @(x){{mean(x) std(x)}})
Out = 1×1 cell array
{1×2 cell}
meanv = Out{:}{1}
meanv = -0.0333
stdv = Out{:}{2}
stdv = 0.9419
Make appropriate changes to work with your data.
.

2 Comments

that's a nice idea! thank you!
My pleasure!

Sign in to comment.

Categories

Asked:

on 19 Oct 2022

Commented:

on 19 Oct 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!