How can I do matrix multiplication on sets of data in an array without looping?
Show older comments
I'm calculating forward kinematics for a robotic maniupulator running at 500 hz. This means that I have millions of sets of joint angles that I'm doing forwad kinematics on and I'm attempting to do this mathematics as quickly as possible. I've been able to do everything quickly outside of a loop except for the final matrix multiplication. My loop takes multiple arrays that are Nx4, it's broken down into 4x4 matrices that are multiplied by eachother to get from base frame to the final end effector frame. Here's a sample of the data (I can't share as much info as I would hope), it's 3 arrays each with 3 sets of 4x4 matrices.
Tc0 = [1,0,0,0; 0,1,0,0; 0,0,1,0; 0,0,0,1; 1,0,0,0; 0,1,0,0; 0,0,1,0; 0,0,0,1; 1,0,0,0; 0,1,0,0; 0,0,1,0; 0,0,0,1];
T01 = [0.74,-0.66,0,0.0012; 0.66,0.74,0.002,0.0007; -0.0013,-0.0015,0.999,0.35; 0,0,0,1; 0.746,-0.66,0,0.0012; 0.66,0.74,0.002,0.0007; -0.0013,-0.0015,0.99,0.35; 0,0,0,1; 0.746,-0.66,0,0.0012; 0.66,0.74,0.002,0.0007; -0.0013,-0.0015,0.99,0.35; 0,0,0,1];
T12 = [0.82,0.57,0,0.003; 0.0009,-0.0012,-0.999,-0.25; -0.57,0.82,-0.0015,-0.0004; 0,0,0,1; 0.82,0.57,0,0.003; 0.0009,-0.0012,-0.999,-0.25; -0.57,0.82,-0.0015,-0.0004; 0,0,0,1; 0.82,0.57,0,0.003; 0.0009,-0.0012,-0.999,-0.25; -0.57,0.82,-0.0015,-0.0004; 0,0,0,1];
q = size(T12,1);
Tc3 = zeros(q,4);
for i=1:4:q
Tc3(i:i+3,1:4) = ((Tc0(i:i+3,1:4)*T01(i:i+3,1:4))*T12(i:i+3,1:4));
end
So that takes about 2 seconds for every 100,000 loops, which isn't bad. But over millions of data points it really adds up. I'd like to avoid using a loop but I can't just multiply the entire arrays together (not that I know of since elementwise multiplication isn't correct).
Any ideas?
Thanks
EDIT: I'm wondering if maybe there would be a way to do this using 3D arrays? There Tc0 = reshape(Tc0,4,4,3); T01=reshape(T01,4,4,3); T12=reshape(T01,4,4,3) and then somehow multiply each immediately with respect to their 3rd dimension index?
Accepted Answer
More Answers (1)
I don't know if it would be any more efficient than your loop (you could check with some timing tests) but here is an alternative approach without loops. The idea is to put each of your three 4x4 matrices that are currently stacked in a 12x4 matrices into a block diagonal matrix with each of the 4x4 matrices on the main diagonal. Then you can directly multiply each of these block diagonal matrices to get your result. If you need to get it back into stacked form you can do this through indexing at the end. I'm not sure from the description you gave whether you always have just three stacked matrices or whether that was just for the example. If you typically have many more than three stacked matrices, then this approach will not scale well as you would get huge 4*N x 4*N block diagonal matrices where N is the number of stacked matrices.
Anyhow in case it is helpful here is some code that illustrates the approach
% put stacked matrices into block diagonal matrices and multiply
Tc3diag = blkdiag(Tc0(1:4,:),Tc0(5:8,:),Tc0(9:12,:))*...
blkdiag(T01(1:4,:),T01(5:8,:),T01(9:12,:))*...
blkdiag(T12(1:4,:),T12(5:8,:),T12(9:12,:));
% collapse the output block diagonal matrix into a stacked matrix
Tc3= [Tc3diag(1:4,1:4);Tc3diag(5:8,5:8);Tc3diag(9:12,9:12)];
3 Comments
John Kinch
on 16 Apr 2019
I was also thinking along the lines of storing the values in a 3d array rather than a stacked 2d array. It still seems that you would have to have a loop, but I made the following script to compare the timing for the two approaches:
% timing comparison test on block matrix multiplication for large matrices
N = 1e6; % number of individual matrices
% looping through stacked matrices (current approach)
Tc0 = rand(4*N,4);
T01 = rand(4*N,4);
T12 = rand(4*N,4);
q = size(T12,1);
Tc3 = zeros(q,4);
disp(' ')
disp('elapsed time for current approach')
tic
for i=1:4:q
Tc3(i:i+3,1:4) = ((Tc0(i:i+3,1:4)*T01(i:i+3,1:4))*T12(i:i+3,1:4));
end
toc
% looping through 3d array (alternate approach)
Tc0 = rand(4,4,N);
T01 = rand(4,4,N);
T12 = rand(4,4,N);
Tc3 = zeros(4,4,N);
disp(' ')
disp('elapsed time for alternate approach')
tic
for i = 1:N
TC3 = Tc0(:,:,i)*T01(:,:,i)*T12(:,:,i);
end
toc
I found that the 3d approach was in fact faster typically it took approximately 60% as long as the current approach. So not a huge savings, but something. I think to use this though you would want to always store the data in the 3d arrays. Otherwise, I think you'd probably use up some of your savings with any reshaping.
By the way, if you needed to do the reshaping, you can do this using
Tc0_3d = permute(reshape(Tc0',4,4,N),[2,1,3])
Note that in the above you first transpose the matrix , and then permute the matrix dimensions to deal with the fact that the reshape acts columnwise.
Jon
on 16 Apr 2019
Actually, going back to the original idea of keeping the data in block diagonal matrices, this may still be a viable approach if you use sparse block diagonal matrices. In this case I think the memory requirements would be similar to what you are doing already. In the attached code I did a timing test for this approach with N = 1e6 individual blocks. I found that the matrix multiply, which is now just one line of code, takes only 0.8 seconds compared to 2.5 seconds to 4 seconds for the other approaches. This seems to be a substantial speedup. I don't know whether you have the freedom to rewrite the underlying code so that you can just stay with sparse block diagonal matrices. Otherwise going back and forth from stacked to sparse block diagonal would probably eat up any time savings.
%% sparse block matrix multiplication
N = 1e6
% make sparse block diagonal matrices
Tc0blocks = cell(N,1);
T01blocks = cell(N,1);
T12blocks = cell(N,1);
% loop to assign individual block elements
for k = 1:N
Tc0blocks{k} = sparse(rand(4,4));
T01blocks{k}= sparse(rand(4,4));
T12blocks{k} = sparse(rand(4,4));
end
% combine into block diagonal matrices
% They will be sparse because elements are sparse
Tc0 = blkdiag(Tc0blocks{:});
T01 = blkdiag(T01blocks{:});
T12 = blkdiag(T12blocks{:});
disp(' ')
disp('elapsed time for sparse block diagonal matrix approach')
tic
Tc3 = Tc0*T01*T12;
toc
Note in my timing I only count the part where the multiplication if done. The other part is just setting up to do the test
Categories
Find more on Matrix Operations in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!