Efficient and compact code for summing each diagonal (or antidiagonal) in a matrix without using a forloop.
Works well for large matrices.
For 3D matrix input A, the sum of diagonals of A(:,:,k) are returned in
sumMat(:,k). The script is typically faster than a forloop based approach when A is 3D.
The code is most efficient for wide or tall matrices. Inline the code when it is used as a part of an iterative algorithm, to avoid recomputing constant indexing matrices.
Note that a forloop implementation with the diag() function can be faster, and has a lower memory requirement, especially in the square 2D case.
1.4  Added support for 3D matrices as input. 

1.1  Added comment on the speed of the implementation compared to using a forloop approach. 
Ton D (view profile)
For 2D matrices if you want the antidiagonal sums the following code is much faster (and not hard to adapt for for ND matrices or diagonal sums):
m=size(A,1);
n=size(A,2);
tmp = [A;zeros(n)];
tmp = tmp(:);
res = sum(reshape(tmp(1:endn),m+n1,n),2);
Marcus Björk (view profile)
That's true Sven. I added 3D matrix support now, if anyone has an application for that.
In my case the code was used in an iterative algorithm, in which case the matrix is not known beforehand and cannot be stored in a 3D matrix. I figured the code was nice enough to make a function out of it and upload it here.
Sven (view profile)
Ha, but if you used the sumDiags() function itself multiple times you wouldn't actually get your speedup :)
I'd say if that's the goal then perhaps you can rewrite sumDiags to take in a 3D matrix and give the output as the sum of diags of each sheet.
In the little tests I ran I found that the sum(diag()) loop was about twice as fast for square matrices and about equal for really wide (10x10000) matrices.
Oh, and yep you're right that rot90() is the way to go for antidiagonals :)
Marcus Björk (view profile)
Forgot to mention: If you need to compute the sum of diagonals for several matrices of the same size (in another loop), which was the application at hand, this forloop free implementation is also faster (well, the inlined version). See below:
Test code:
%Input
K=1000;
A=randn(1000,100);
%Common
[N,M]=size(A);
%% Method 1
tic;
d = 1N:M1;
out = zeros(M+N1,1,class(A));
for k=1:K
for p = 1:length(d)
out(p) = sum(diag(A,d(p)));
end
A=randn(N,M);
end
toc
%% Method 2
tic;
Amod=zeros(N+M1,M);
logVec = [false(M1,1);true(N,1);false(M1,1)];
indMat = bsxfun(@plus, (1:M+N1)',0:M1);
logMat = logVec(indMat);
for k=1:K
Amod(logMat)=A;
sumVec=sum(Amod,2);
A=randn(N,M);
end
toc
>>
Elapsed time is 2.632392 seconds.
Elapsed time is 1.307457 seconds.
Marcus Björk (view profile)
Thanks for your comment Sven!
I had the forloop implementation already, which is trivial, but was posed with the problem of solving it without a forloop. Hence, I wrote this code. (Which could perhaps be optimized further?)
Didn't actually check which was faster, and I should probably make a note of this in the description.
Furthermore, you don't get the antidiagonals by transposing. But rot90(A) works and is almost as fast as using the extra input.
Sven (view profile)
Hi Marcus,
Nice entry, but sometimes a forloop can be useful. The function below is simpler, uses less memory (no extra matrices or masks needed), and is actually more efficient (faster).
function out = sumDiag(A)
d = 1size(A,1):size(A,2)1;
out = zeros(size(d),class(A));
for k = 1:length(d)
out(k) = sum(diag(A,d(k)));
end
I chose the opposite convention, whereby the first element of the output is the lower left rather than upper right. I think this fits in with MATLAB's convention of numbering diagonals as lower to the left and higher to the right, but you can change it easily if you'd like on the first line.
Also, I think that the antidiagonals case should really just be up to the user who can call sumDiag(A) or sumDiag(A'), as that is simpler for the same result than (possibly) providing an extra argument.