Multiplying a square lower triangular matrix times a n by m matrix where m >> n OPTIMIZATION

3 views (last 30 days)
I thought the second chunk of code %Slow would be the fastest out of the three but the third is much much much faster than the first two. Does anyone have any suggestions on how to multiply a square lower triangular matrix times a n by m matrix where m >> n? Below is the time results I obtain from the console...
Elapsed time is 21.379955 seconds.
cpu_average_1 =
0.3269
Elapsed time is 18.679921 seconds.
cpu_average_2 =
0.3183
Elapsed time is 4.811853 seconds.
cpu_average_3 =
0.0482
To reproduce these results just copy and paste into a script file and run.
Any resources or suggestions would be much appreciated at this point. Thank you.
H=2;
Y = ones(1,7776).*3;
A=magic(7);
% Slowest
tic
cpu_elapsed = zeros(1,100);
for i=1:100
cpu_start = cputime;
for j=1:100
K = zeros(7,7776);
Z = zeros(7,7776);
for s=1:7
Z(s,:) = H*A(s,:)*K + Y;
K(s,:) = Z(s,:); % Dummy variable...do not delete
end
end
cpu_elapsed(i) = cputime-cpu_start;
end
toc
cpu_average_1 = sum(cpu_elapsed)/100
figure;
plot(1:100,cpu_elapsed);
% Slow
tic
cpu_elapsed = zeros(1,100);
for i=1:100
cpu_start = cputime;
for j=1:100
K = zeros(7,7776);
Z = zeros(7,7776);
for s=1:7
Z(s,:) = H*A(s,1:s)*K(1:s,:) + Y;
K(s,:) = Z(s,:); % Dummy variable...do not delete
end
end
cpu_elapsed(i) = cputime-cpu_start;
end
toc
cpu_average_2 = sum(cpu_elapsed)/100
figure;
plot(1:100,cpu_elapsed);
% Fastest
tic
Y = Y';
cpu_elapsed = zeros(1,100);
for i=1:100
cpu_start = cputime;
for inner=1:100
K = zeros(7776,7);
Z = zeros(7776,7);
for s = 1:7
if ( s > 1 )
for j = 1:s-1
Z(:,s) = Z(:,s) + H*A(s, j)*K(:,j);
end
end
Z(:,s) = Z(:,s) + Y;
K(:,s) = Z(:,s); % Dummy variable,...do not delete
end
end
cpu_elapsed(i) = cputime-cpu_start;
end
toc
cpu_average_3 = sum(cpu_elapsed)/100
figure;
plot(1:100,cpu_elapsed);

Answers (1)

Matt J
Matt J on 17 Oct 2013
Edited: Matt J on 17 Oct 2013
My suggestion would be to do the multiplication explicitly, taking advantage of MATLAB's built-in mult-threading. The time I get
A=tril(magic(7));
K = zeros(7,7776);
tic;Z=A*K;toc
Elapsed time is 0.004754 seconds.
is much much faster than the 3 other schemes.
  3 Comments
Tony
Tony on 17 Oct 2013
Matt,
I agree with your suggestion given the above example, but I think I over simplified my question. Instead of K(:,s) = Z(:,s), K(:,s) is actually an output of a function call that depends on Z. The input to the function needs to be the dot product for s number of rows of H*A(s,:)*K + Y. For example,
2*[ 1 0 0; 4 5 0; 7 8 9 ]*[1 2; 3 4; 5 6] + [ 1 2 ] = [ 3 6 ] when the for loop variable (s) equals 1. So [ 3 6 ] should then be the input to the function_call. Please feel free to ask more questions to clarify my question.
H=2;
Y = ones(1,7776).*3;
A=magic(7);
% Slowest
tic
cpu_elapsed = zeros(1,100);
for i=1:100
cpu_start = cputime;
for j=1:100
K = zeros(7,7776);
Z = zeros(7,7776);
for s=1:7
Z(s,:) = H*A(s,:)*K + Y;
K(s,:) = Function_Call(Z(s,:)); % Dummy variable...do not delete
end
end
cpu_elapsed(i) = cputime-cpu_start;
end
toc
cpu_average_1 = sum(cpu_elapsed)/100
figure;
plot(1:100,cpu_elapsed);
% Slow
tic
cpu_elapsed = zeros(1,100);
for i=1:100
cpu_start = cputime;
for j=1:100
K = zeros(7,7776);
Z = zeros(7,7776);
for s=1:7
Z(s,:) = H*A(s,1:s)*K(1:s,:) + Y;
K(s,:) = Function_Call(Z(s,:)); % Dummy variable...do not delete
end
end
cpu_elapsed(i) = cputime-cpu_start;
end
toc
cpu_average_2 = sum(cpu_elapsed)/100
figure;
plot(1:100,cpu_elapsed);
% Fastest
tic
Y = Y';
cpu_elapsed = zeros(1,100);
for i=1:100
cpu_start = cputime;
for inner=1:100
K = zeros(7776,7);
Z = zeros(7776,7);
for s = 1:7
if ( s > 1 )
for j = 1:s-1
Z(:,s) = Z(:,s) + H*A(s, j)*K(:,j);
end
end
Z(:,s) = Z(:,s) + Y;
K(:,s) = Function_Call(Z(:,s)); % Dummy variable,...do not delete
end
end
cpu_elapsed(i) = cputime-cpu_start;
end
toc
cpu_average_3 = sum(cpu_elapsed)/100
figure;
plot(1:100,cpu_elapsed);
Matt J
Matt J on 18 Oct 2013
Edited: Matt J on 18 Oct 2013
I see. Well, the main reason version 3 of your tests runs so much faster is because there, you are extracting data column-wise instead of row-wise, e.g., K(:,s) instead of K(s,:), which is always faster due to memory contiguity. You can modify the other version in a similar way and reach the same order of magnitude of speed, e.g.,
A=H*A;
Yt=Y.';
for j=1:100
K = zeros(7776,7);
Z = zeros(7776,7);
for s=1:7
Z(:,s) = Yt+K*A(:,s);
K(:,s) = Z(:,s);
end
end

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!