Multiplying a square lower triangular matrix times a n by m matrix where m >> n OPTIMIZATION
3 views (last 30 days)
Show older comments
I thought the second chunk of code %Slow would be the fastest out of the three but the third is much much much faster than the first two. Does anyone have any suggestions on how to multiply a square lower triangular matrix times a n by m matrix where m >> n? Below is the time results I obtain from the console...
Elapsed time is 21.379955 seconds.
cpu_average_1 =
0.3269
Elapsed time is 18.679921 seconds.
cpu_average_2 =
0.3183
Elapsed time is 4.811853 seconds.
cpu_average_3 =
0.0482
To reproduce these results just copy and paste into a script file and run.
Any resources or suggestions would be much appreciated at this point. Thank you.
H=2;
Y = ones(1,7776).*3;
A=magic(7);
% Slowest
tic
cpu_elapsed = zeros(1,100);
for i=1:100
cpu_start = cputime;
for j=1:100
K = zeros(7,7776);
Z = zeros(7,7776);
for s=1:7
Z(s,:) = H*A(s,:)*K + Y;
K(s,:) = Z(s,:); % Dummy variable...do not delete
end
end
cpu_elapsed(i) = cputime-cpu_start;
end
toc
cpu_average_1 = sum(cpu_elapsed)/100
figure;
plot(1:100,cpu_elapsed);
% Slow
tic
cpu_elapsed = zeros(1,100);
for i=1:100
cpu_start = cputime;
for j=1:100
K = zeros(7,7776);
Z = zeros(7,7776);
for s=1:7
Z(s,:) = H*A(s,1:s)*K(1:s,:) + Y;
K(s,:) = Z(s,:); % Dummy variable...do not delete
end
end
cpu_elapsed(i) = cputime-cpu_start;
end
toc
cpu_average_2 = sum(cpu_elapsed)/100
figure;
plot(1:100,cpu_elapsed);
% Fastest
tic
Y = Y';
cpu_elapsed = zeros(1,100);
for i=1:100
cpu_start = cputime;
for inner=1:100
K = zeros(7776,7);
Z = zeros(7776,7);
for s = 1:7
if ( s > 1 )
for j = 1:s-1
Z(:,s) = Z(:,s) + H*A(s, j)*K(:,j);
end
end
Z(:,s) = Z(:,s) + Y;
K(:,s) = Z(:,s); % Dummy variable,...do not delete
end
end
cpu_elapsed(i) = cputime-cpu_start;
end
toc
cpu_average_3 = sum(cpu_elapsed)/100
figure;
plot(1:100,cpu_elapsed);
0 Comments
Answers (1)
Matt J
on 17 Oct 2013
Edited: Matt J
on 17 Oct 2013
My suggestion would be to do the multiplication explicitly, taking advantage of MATLAB's built-in mult-threading. The time I get
A=tril(magic(7));
K = zeros(7,7776);
tic;Z=A*K;toc
Elapsed time is 0.004754 seconds.
is much much faster than the 3 other schemes.
3 Comments
Matt J
on 18 Oct 2013
Edited: Matt J
on 18 Oct 2013
I see. Well, the main reason version 3 of your tests runs so much faster is because there, you are extracting data column-wise instead of row-wise, e.g., K(:,s) instead of K(s,:), which is always faster due to memory contiguity. You can modify the other version in a similar way and reach the same order of magnitude of speed, e.g.,
A=H*A;
Yt=Y.';
for j=1:100
K = zeros(7776,7);
Z = zeros(7776,7);
for s=1:7
Z(:,s) = Yt+K*A(:,s);
K(:,s) = Z(:,s);
end
end
See Also
Categories
Find more on Matrix Indexing in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!