How to speed up this for loop containing kronecker product?

I want to speed up the following for loop.
% x is n by N. S is k by N and nonnegative.
% Use random matrices for testing.
% Elapsed time of the following code is around 19 seconds.
% So, when N is large, it's very slow.
n= 800; N =100; k =10;
x = rand(n,N); S = rand(k,N); H = 0;
for i = 1: size(x,2)
X = x(:,i)*x(:,i)' ;
DW = diag( S(:,i) ) - S(:,i)*S(:,i)' ;
H = H + kron(X,DW);
end
My attempt:
kron(X, DW) = kron(x(:,i)*x(:,i)' ,diag(S(:,i))) - kron(x(:,i)*x(:,i)', S(:,i)*S(:,i)');
We can use and to rewrite the above equation.
kron(x(:,i)*x(:,i)',diag(S(:,i))) = kron(x(:,i), sqrt( diag(S(:,i))))* kron(x(:,i)*x(:,i)',diag(S(:,i)))' ; (since S is nonnegative then we can take sqrt )
kron(x(:,i)*x(:,i)', S(:,i)*S(:,i)') = kron( x(:,i), S(:,i))*kron( x(:,i), S(:,i))'.
Therefore, we only need to compute kron(x(:,i), sqrt( diag(S(:,i)))) and kron(x(:,i), S(:,i)).
Here are the codes:
n = 800; N = 100; k = 10;
x = rand(n,N); S = rand(k,N);
H1= 0; K_D= zeros(n*k, k*1, N); K_S = zeros(n*k,N);
%K_D records kron( x(:,i), sqrt (diag(S(:,i)) ) ) ,
% K_S records kron(x(:,i), S(:,i));
for i = 1:N
D_half = sqrt( diag(S(:,i)));
K_D(:,:,i) = kron( x(:,i),D_half);
K_S(:,i) = reshape (S(:,i)*x(:,i)',[],1);
end
K_D = reshape(K_D,n*k,[]);
H = K_D*K_D' - K_S*K_S';
The new codes save much time. Elapsed time is around 1 second. But I still want to speed up it.
Can someone help me speed up the above code (my attempt)? Or do someone have a new idea/method to speed up my original problem?
Thank you very much!

Answers (2)

I get a measureable speedup with reducing the number of sqrt() calls:
% Replace
D_half = sqrt( diag(S(:,i)));
% by
D_half = diag(sqrt(S(:, i)));

5 Comments

I played with your efficient version without getting remarkably more speed.
Are you using a Matlab version before 2013b? Not likely, but if so: https://www.mathworks.com/matlabcentral/fileexchange/24499-kronecker
The main part of the computing time happens in this line now:
H = K_D * K_D' - K_S * K_S';
This allocates memory for 2 or (I'm afraid) 3 arrays. Maybe an efficient C-mex function can beat this. But the actual matrix mutliplication is performed efficiently in the BLAS/MKL-library already.
I am using 2020b. You said you did not get remarkably more speed. That's strange. I am going to try on another computer with different MATLAB version.
I assume, this is a missunderstanding. I did not get significant improvements compared with your last version including the SQRT reordering:
n = 800;
N = 100;
k = 10;
x = rand(n,N);
S = rand(k,N);
K_D = zeros(n*k, k*1, N);
K_S = zeros(n*k, N);
for i = 1:N
D_half = diag(sqrt(S(:,i)));
K_D(:, :, i) = kron(x(:,i), D_half);
K_S(:, i) = reshape(S(:, i) * x(:, i).', [], 1);
end
K_D = reshape(K_D, n*k, []);
H = K_D * K_D' - K_S * K_S'; % <- Bottleneck
Replacing kron() by an inlined version or a BSXFUN appraoch does not change the speed sufficiently. It should be possible to squeeze some time here, because D_half is almost empty. But the bottleneck is in the last line.
Yes. I use this line
H = K_D * K_D' - K_S * K_S';
to replace the following sum. The sum is much slower.
for i = 1:N
DW = diag(S(:,i)) - S(:,i)*S(:,i)';
H = H + kron(x(:,i)*x(:,i)',DW);
end

Sign in to comment.

Yesterday I profile your code and the majority of time is eaten by
H = K_D*K_D' - K_S*K_S';
not by Kronecker.
The question is what is your intention of using H? If it can reduce to (matrix-vector) product (including some EIG,SVD algorithms), then you should leave the low rank decomposition (K_D/K_S), rather than storing explicitly H.
Just like using BFGS formula, no one stores the matrix B, they rather store a sequence of vectors (y,s).
Similarly, most of the time Kroneker matrix can be avoid to group the state vectors, rather using linear operators.

5 Comments

Yes. In the efficient version the majority of time is eaten by that line. But in my original codes, calculating kron is very time-consuming. So, I am trying to speed it up. The "efficient version" is one of my attempts, which is the best I can have.
H represents the Hessian. I would like to implement the exact Newton's method. Also, I want to implement a "modified" Newton's method, i.e. using approximation to the Hessian. Since I need to create a new approximation to the Hessian, I need to calculate the (approximate) Hessian.
For this, "Similarly, most of the time Kroneker matrix can be avoid to group the state vectors, rather using linear operators.", can you give me some examples? That may give me some idea. Thank you very much!
You H is sum of
kron(X,DW)
When you request to get H*v for a given v, you simply make a sum of
kron(X,DW)*v
meaniing you compute
V = reshape(v,size(DW,2),size(X,2));
Y = DW*V*X.'
the sum Y(:) to get H*v.
Instead of compute H, just do the above calulation when H*y is request, It migh be faster.
Thank you for your suggestion. Actually, I need to compute the inverse of H, or get H^-1*v for a given v. So, in this case, can we get rid of calculating H?
Your H has rank < size(H) so there exists no inverse.
For newton method you perhaps need to compute
-pinv(H)*g
meaning solving
H*y = g
projected on the image of H. You do not need to compute H, you can use the projection or in some iterative methode you only required to evaluate H*v.
Just wonder if you are aware of quasi-Newton method, BFGS formula and limited memory variant of such formula.
Those are intensively studied and well known, why reinvent the wheel?
Thank you. The Hessian is nonsingular because H is not the "final Hessian" and the approximation is SPD.
Because I want to try something new.

Sign in to comment.

Categories

Find more on Function Creation in Help Center and File Exchange

Asked:

on 8 Apr 2021

Commented:

on 9 Apr 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!