Discover MakerZone

MATLAB and Simulink resources for Arduino, LEGO, and Raspberry Pi

Learn more

Discover what MATLAB® can do for your career.

Opportunities for recent engineering grads.

Apply Today

Thread Subject:
Efficiently computing large numbers of vector norms

Subject: Efficiently computing large numbers of vector norms

From: Harry Commin

Date: 7 Nov, 2012 21:58:11

Message: 1 of 17

I am frequently faced with problems where I need to find the norm (squared) of row vectors having the following structure:

A(:,i)'*X

where A is (N x Q) complex, and solutions must be found for all i = 1,2,...,Q. Doing this efficiently turns out to be extraordinarily important in my code. In this simple case, I find the following to be pretty swift (compared to a for loop):

sum(abs(A'*X).^2,2)

However, I become stumped with a slightly more complicated case:

kron(A1(:,i),A2(:,j))'*X2

where now a fast solution is needed for all i = 1,2,...,Q1 and j = 1,2,...,Q2. Here, I resort to:

Z = zeros(Q1,Q2);
for j = 1:Q2
    Z(:,j) = sum(abs(kron(A1,A2(:,j))'*X2).^2,2);
end

Is there a neat way to speed this up? Is there a way to further speed up my sum(abs(A'*X).^2,2)? Efficiency really is critical here!

Subject: Efficiently computing large numbers of vector norms

From: Harry Commin

Date: 7 Nov, 2012 22:03:14

Message: 2 of 17

I should add that X is always square complex.

Subject: Efficiently computing large numbers of vector norms

From: Teja Muppirala

Date: 8 Nov, 2012 06:11:13

Message: 3 of 17

I believe that instead of taking absolute values of complex numbers and squaring, it is faster to multiply by the conjugate.

Instead of this:

M = something;
Z = sum(abs(M).^2),2)

Do this:

M = something;
Z = sum(M.*conj(M),2)

Subject: Efficiently computing large numbers of vector norms

From: Teja Muppirala

Date: 8 Nov, 2012 06:17:24

Message: 4 of 17

In fact I think this might even be better:

M = something;
Z = sum(real(M).^2 + imag(M).^2,2);

Subject: Efficiently computing large numbers of vector norms

From: Bruno Luong

Date: 8 Nov, 2012 12:55:09

Message: 5 of 17

"Harry Commin" wrote in message <k7elhj$as1$1@newscl01ah.mathworks.com>...

> for j = 1:Q2
> Z(:,j) = sum(abs(kron(A1,A2(:,j))'*X2).^2,2);
> end
>

This intermediate result
kron(A1,A2(:,j))'*X2

can be better computed using kronecker's property. See when Matt pickup this thread.

Bruno

Subject: Efficiently computing large numbers of vector norms

From: Matt J

Date: 9 Nov, 2012 01:05:20

Message: 6 of 17

"Harry Commin" wrote in message <k7elhj$as1$1@newscl01ah.mathworks.com>...
> I am frequently faced with problems where I need to find the norm (squared) of row vectors having the following structure:
>
> A(:,i)'*X
>
> where A is (N x Q) complex, and solutions must be found for all i = 1,2,...,Q. Doing this efficiently turns out to be extraordinarily important in my code. In this simple case, I find the following to be pretty swift (compared to a for loop):
>
> sum(abs(A'*X).^2,2)
>
> However, I become stumped with a slightly more complicated case:
>
> kron(A1(:,i),A2(:,j))'*X2

So first, you could do this simultaneously for all i and j without looping by reorganizing as follows


result = sum( abs( kron(A1',A2')*X2 ).^2 , 2) ;

As Bruno says, though, efficient algorithms are available for matrix multiplications with a Kronecker product and my KronProd class implements them,

http://www.mathworks.com/matlabcentral/fileexchange/25969-efficient-object-oriented-kronecker-product-manipulation

In your case, this would lead to

result = sum( abs( KronProd(A2',A1')*X2 ).^2 , 2) ;

Note also that if X also happens to be a Kronecker product,

  X2= kron(XX2,XX1);

then the whole thing gets even simpler/faster

  rownorms = @(M) sum(abs(M).^2);

  result=kron(rownorms(A2'*XX2) , rownorms(A1'*XX1));









>
> where now a fast solution is needed for all i = 1,2,...,Q1 and j = 1,2,...,Q2. Here, I resort to:
>
> Z = zeros(Q1,Q2);
> for j = 1:Q2
> Z(:,j) = sum(abs(kron(A1,A2(:,j))'*X2).^2,2);
> end
>
> Is there a neat way to speed this up? Is there a way to further speed up my sum(abs(A'*X).^2,2)? Efficiency really is critical here!

Subject: Efficiently computing large numbers of vector norms

From: Matt J

Date: 9 Nov, 2012 14:49:18

Message: 7 of 17

"Matt J" wrote in message <k7hksg$etp$1@newscl01ah.mathworks.com>...
>
> In your case, this would lead to
>
> result = sum( abs( KronProd(A2',A1')*X2 ).^2 , 2) ;

Make that

 result = sum( abs( KronProd({A2',A1'})*X2 ).^2 , 2) ;



> then the whole thing gets even simpler/faster
>
> rownorms = @(M) sum(abs(M).^2);
>
> result=kron(rownorms(A2'*XX2) , rownorms(A1'*XX1));


And make that

 rownorms = @(M) sum(abs(M).^2 , 2);

Subject: Efficiently computing large numbers of vector norms

From: Harry Commin

Date: 12 Nov, 2012 19:48:21

Message: 8 of 17

Thanks very much for these suggestions. Perhaps I am doing something wrong -- while I am getting the correct results numerically, I am not getting the speed boost I had hoped for.

> result = sum( abs( kron(A1',A2')*X2 ).^2 , 2) ;

For some simple test data, I find this to perform a little faster (including the necessary reshape operation). Approximately 0.032secs compared to 0.037secs.

> result = sum( abs( KronProd({A2',A1'})*X2 ).^2 , 2) ;

In this case, my performance is much slower on the same test data (approx. 0.13secs).

My requirement is to execute this calculation many times on data similar in dimension to my test data (rather than once on enormous matrices). Perhaps this is the issue? (I'm running R2010b, but have a licence for R2012b which I can install if necessary).

Subject: Efficiently computing large numbers of vector norms

From: Harry Commin

Date: 12 Nov, 2012 21:32:05

Message: 9 of 17

Sorry, please disregard my last post. I repeated for 1000 iterations instead of 1 and I now get:

My original loop: 29.3665 secs
Loop-less kron(): 25.209 secs
Loop-less KronProd(): 20.8823 secs

This looks like a saving of around a third, which could measure in days for me. I guess things settle down after the first iteration(?)

Subject: Efficiently computing large numbers of vector norms

From: Matt J

Date: 13 Nov, 2012 16:28:13

Message: 10 of 17

"Harry Commin" wrote in message <k7rpsl$ohp$1@newscl01ah.mathworks.com>...
> Sorry, please disregard my last post. I repeated for 1000 iterations instead of 1 and I now get:
>
> My original loop: 29.3665 secs
> Loop-less kron(): 25.209 secs
> Loop-less KronProd(): 20.8823 secs
>
> This looks like a saving of around a third, which could measure in days for me. I guess things settle down after the first iteration(?)
===========

Yes, things settle down after the first iteration. I would have expected a lot more speed-up, though. It would be interesting to know the data dimensions you're working with.

Note also that if A1 and A2 are fixed and X2 is changing in a loop, you should compute K=KronProd({A2',A1'}) only once prior to your loop and reuse it. The same goes for kron(A1.',A2.'), although I still expect KronProd to be faster.

Subject: Efficiently computing large numbers of vector norms

From: Harry Commin

Date: 13 Nov, 2012 18:22:15

Message: 11 of 17

> It would be interesting to know the data dimensions you're working with.
 
For my tests, I had:

N1 = 3;
N2 = 4;
Q1 = 180;
Q2 = 63;

If I increase N1 and/or N2, the improvement (due to KronProd) decreases.

For more complicated cases, I can't see how to use KronProd at all. In particular, when there is element-by-element multiplication, e.g.:

kron(kron(A1(:,i),A2(:,i).*A3(:,j)),A4(:,j))'*X3

That is, A1 and A2 'belong' to the same loop and, similarly, A3 and A4 belong together to a different loop. However, A2(:,i).*A3(:,j) mixes the two and I can't see how to deal with it!

Subject: Efficiently computing large numbers of vector norms

From: Matt J

Date: 13 Nov, 2012 18:45:14

Message: 12 of 17

"Harry Commin" wrote in message <k7u34n$7fs$1@newscl01ah.mathworks.com>...
> > It would be interesting to know the data dimensions you're working with.
>
> For my tests, I had:
>
> N1 = 3;
> N2 = 4;
> Q1 = 180;
> Q2 = 63;
>
> If I increase N1 and/or N2, the improvement (due to KronProd) decreases.
==============


I would definitely expect increasing improvement with increasing N1 and N2 and my tests below confirm this.

N1 = 100;
N2 = 100;
Q1 = 180;
Q2 = 63;


At1=rand(Q1,N1)+i*rand(Q1,N1);
At2=rand(Q2,N2)+i*rand(Q2,N2);

K=kron(At1,At2);
Kop=KronProd({At2,At1});

nx=size(K,2);
X2=rand(nx,nx);

tic;
 result = sum( abs( K*X2 ).^2 , 2) ;
toc;
%Elapsed time is 124.422905 seconds.

tic;
 result = sum( abs( Kop*X2 ).^2 , 2) ;
toc;
%Elapsed time is 10.947491 seconds.

Subject: Efficiently computing large numbers of vector norms

From: Harry Commin

Date: 13 Nov, 2012 19:23:10

Message: 13 of 17

I cannot repeat your test, as I get an "Out of Memory" error. Furthermore, I require that A1 and A2 change on each iteration. Therefore, the KronProd() operation must also go inside the tic/toc.

Perhaps you could copy and paste my code (for which I get comparable performance using a simple loop):




Ntrials = 100;

% Data Dimensions
N1 = 9; N2 = 7;
Q1 = 180; Q2 = 63;

% Data
A1 = randn(N1,Q1) + 1j*randn(N1,Q1);
A2 = randn(N2,Q2) + 1j*randn(N2,Q2);
X2 = randn(N1*N2,N1*N2) + 1j*randn(N1*N2,N1*N2);

tic
Z2b = zeros(Q1,Q2);
for trial = 1:Ntrials
    for j = 1:Q2
        Z2b(:,j) = sum(abs(kron(A1,A2(:,j))'*X2).^2,2);
    end
end
t = toc;
disp(['Single Loop = ',num2str(t), ' secs'])

tic
for trial = 1:Ntrials
    Z2c = reshape(sum(abs(kron(A1,A2)'*X2).^2,2),Q2,Q1).';
end
t = toc;
disp(['No Loop = ',num2str(t), ' secs'])

tic
for trial = 1:Ntrials
    Z2d = reshape(sum(abs(KronProd({A2',A1'})*X2).^2,2),Q2,Q1).';
end
t = toc;
disp(['Fast KronProd = ',num2str(t), ' secs'])

Subject: Efficiently computing large numbers of vector norms

From: Matt J

Date: 14 Nov, 2012 00:03:15

Message: 14 of 17

"Harry Commin" wrote in message <k7u6mu$ltc$1@newscl01ah.mathworks.com>...
> I cannot repeat your test, as I get an "Out of Memory" error. Furthermore, I require that A1 and A2 change on each iteration. Therefore, the KronProd() operation must also go inside the tic/toc.
>
> Perhaps you could copy and paste my code (for which I get comparable performance using a simple loop):
==============

OK. Well, if A1 and A2 have to change, then that will obviously increase overhead. However, I still find that the gap between KronProd and the other approaches increases with N1, N2. Here are the results of my tests


 Single Loop = 38.3979 secs
 No Loop = 31.6648 secs
 Fast KronProd = 7.1126 secs

but with the modified code below. Note that I removed all of the ctranspose operations from the loops, because in reality they are avoidable and contribute unnecessary overhead. Note that kron(A1,A2)' is implemented much faster as kron(A1',A2') and you could construct A1 and A2 as QixNi to avoid the need to transpose. Also, you should not need to transpose the final result, either. If you want the result to be Q2xQ1 instead of Q1xQ2, then you should apply kron(A2,A1) instead of kron(A1,A2). I've incorporated all of these ideas into the modified code.

Ntrials = 5;

% Data Dimensions
N1 = 9*5; N2 = 7*5;
Q1 = 180; Q2 = 63;

% Data
A1 = randn(N1,Q1) + 1j*randn(N1,Q1);
A2 = randn(N2,Q2) + 1j*randn(N2,Q2);
X2 = randn(N1*N2,N1*N2) + 1j*randn(N1*N2,N1*N2);

A1=A1';
A2=A2';

tic
Z2b = zeros(Q1,Q2);
for trial = 1:Ntrials
    for j = 1:Q2
        Z2b(:,j) = sum(abs(kron(A1,A2(j,:))*X2).^2,2);
    end
end
t = toc;
disp(['Single Loop = ',num2str(t), ' secs'])

tic
for trial = 1:Ntrials
    Z2c = reshape(sum(abs(kron(A2,A1)*X2).^2,2),Q2,Q1);
end
t = toc;
disp(['No Loop = ',num2str(t), ' secs'])

tic
for trial = 1:Ntrials
    Z2d = reshape(sum(abs(KronProd({A1,A2})*X2).^2,2),Q2,Q1);
end
t = toc;
disp(['Fast KronProd = ',num2str(t), ' secs'])

Subject: Efficiently computing large numbers of vector norms

From: Harry Commin

Date: 14 Nov, 2012 11:49:19

Message: 15 of 17

> Note that I removed all of the ctranspose operations from the loops, because in reality they are avoidable and contribute unnecessary overhead.

Thanks, I had no idea that ctranspose would be such an important issue. I'll look back at my problem now and will try to convert it in the ways you suggested.

Subject: Efficiently computing large numbers of vector norms

From: Harry Commin

Date: 15 Nov, 2012 12:11:16

Message: 16 of 17

I copied and pasted your code (but had to remove the *5 on the dimensions to avoid Out of Memory errors) and KronProd performs slower than the simple loop.

How can my results be so different?

Subject: Efficiently computing large numbers of vector norms

From: Matt J

Date: 15 Nov, 2012 13:32:19

Message: 17 of 17

"Harry Commin" wrote in message <k82m54$9m2$1@newscl01ah.mathworks.com>...
> I copied and pasted your code (but had to remove the *5 on the dimensions to avoid Out of Memory errors) and KronProd performs slower than the simple loop.
>
> How can my results be so different?
===============

Without the *5, it is slower for me, too. The data sizes are too small.

Tags for this Thread

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.

Contact us