speed up run time of mldivide function.

Hi,
I am calling mldivide function about 10^7 times:
my syntax looks this way:
C=mldivide(A,B);
where both A,B are sparse matrices. size(A)=2^10*2^10 and size(B)=(1x2^10) (Assume that my matrix A has ~2^12 non zero data points ~95% of them close to center diagonal.A is a lower traingular matrix)
After using the profile command ,when i run the code, i found it takes 60-70% of total time(~2000 secs) in that particular line.(My computer specs are i5, 8gb ram, matlab 2018b)
can i make mldivide compute any faster? (assume that I can only use sparse matrices, and assume that i want to run it on my computer.).
Unfortunately, GPU computing of mldivide using sparse matrice does not seem to work so well. If you can, please comment on using 'GPU computing with (mldivide + sparse matrices)' as well..

 Accepted Answer

If you're solving the system A*x = b with the same A every time (or a small number of different A matrices many times each) try creating a decomposition and reusing that decomposition each time.

8 Comments

Thanks Steven.. that might work.
pb
pb on 26 Nov 2018
Edited: pb on 26 Nov 2018
Hi Steven, i tried and decomposition did not help in my case. I have written an example code below to show:
%test 2 decompoistion
n = 1e3;
rng default % for reproducibility
A = speye(n);
A(2,1)=4;
A(3,2)=1;
A(10,9)=5.6;
x1 = randn(n,1);
x2 = x1;
tic
for ii=1:100
x1 = A \ x1;
x1 = x1 / norm(x1);
end
toc
Elapsed time is 0.001742 seconds.
dA = decomposition(A);
tic
for ii=1:100
x2 = dA \ x2;
x2 = x2 / norm(x2);
end
toc
Elapsed time is 0.013727 seconds.
Taking another look, 2000 seconds may seem like a long time (and it is, speaking absolutely.) But that's 2000 seconds to perform 1e7 solves. On average that's about 0.0002 seconds per solve. That's pretty quick: 500 to 750 solves in the blink of an eye.
You could try tuning some of the parameters used by sparse backslash using the spparms function but I'm not sure how much faster you could make that computation. Your computer will need to do some work to solve the system so there's some minimum amount of time the computation will take.
About the only other way to speed this up I could think of offhand (which may not be applicable for your use case) would be to combine the right-hand sides into a matrix and solve matrix\matrix instead of repeatedly solving matrix\vector.
Hi pb,
You're running into a known issue of the decomposition object: The numerical computation is faster than the overhead from calling a method of an object. You could try computing the LU decomposition in advance instead:
[L, U, P] = lu(A);
% Inside the iteration:
x = U\(L\(P*b));
Another thing you might try: The matrix you are constructing has 1000 entries, but after the first block A(1:10, 1:10), it's just the identity matrix. You could only compute
Ablock = A(1:10, 1:10);
x1(1:10) = Ablock \ x1(1:10);
if the matrix A always has this form.
Thank you for the answers Steven , Christine. I will try both methods.
Hi Steven, Christine,
You are right ,there seems to be a minimum amount of time it takes.(~0.002 per call)
Also, block implemenetation is not helping to speed up my code either. This because my A matrix is lower traiangular-(1024*1024--and i had non zero and non 1 elements in almost every row)
Thank you for the suggestions. I had lot of fun trying them out.
Bruno Luong
Bruno Luong on 28 Nov 2018
Edited: Bruno Luong on 28 Nov 2018
Matrix upper/lower triangular does not need to de factorized since back-substution can work directly with original matrix. So it's normal decomposition, lu, chol or friends won't help (they actually do nothing more than factorizing a general matrix of two or more triangulars/permutation matrices, which in your case reduce nothing).
You might want to program your own back-substitution in MEX if the structure of your matrix is special. But I doubt you'll beat MATLAB.
I think what you ask is impossible,just buy a faster machine.
Thank you Bruno for the comment.
I was thinking about MEX too.but it might comsume asignificant amount of time.So i am avoiding that.
I dont want to spend money. So faster machine is not possible either.

Sign in to comment.

More Answers (2)

You might want to use iterative method.

7 Comments

thank you for the answer,i will try it.
Some test of three methods with a more "realistic" matrix
A = delsq(numgrid('S',1000));
b = randn(length(A),1);
ntest = 1;
tic
for ii=1:ntest
x = A \ b;
end
toc % Elapsed time is 2.133235 seconds.
tic % iteratibe method
for ii=1:ntest
[x,flag,relres,iter,resvec] = cgs(A,b);
end
toc % Elapsed time is 0.639087 seconds.
dA = decomposition(A);
tic
for ii=1:ntest
x = dA \ b;
end
toc % Elapsed time is 0.171485 seconds.
pb
pb on 26 Nov 2018
Edited: pb on 27 Nov 2018
it is similar to matrix in example on page under the section 'Solve Linear System with Several Right-Hand Sides'. I am aware of this example.
However, in my question, i start with saying my A is quite sparse. In this case would it be possible to make the line below faster?
C=mldivide(A,B);
Thank you.
Dense? Here is the density of my matrix
>> nnz(A)/numel(A)
ans =
5.0160e-06
There are about 5 non-zero elements per row/column and the matrix size is 996004 x 996004.
You need to quantify what is "quite sparse" for us or give a way to realisticly simulate your matrix.
pb
pb on 27 Nov 2018
Edited: pb on 27 Nov 2018
I misunderstood delsq(numgrid())-function to be some thing else. I see your point of view.
but i tried decomposition and it did not speed up my code.
Thank you for the comment though.
pb
pb on 27 Nov 2018
Edited: pb on 27 Nov 2018
Hi Bruno, I have taken your code and made minor changes: the results are below:
A = sparse(eye(10000));
b = randn(length(A),1);
A(:,1)=A(:,1)+b;A(1,1)=1;
A(:,2)=A(:,2)+b;A(2,2)=1;A(1,2)=0;
A(:,3)=A(:,3)+b;A(3,3)=1;A(1,3)=0;A(2,3)=0;
ntest = 10000;
x1=b;
tic
for ii=1:ntest
x11 = A \ x1;
end
toc
Elapsed time is 0.889389 seconds.
x2=b;
tic % iteratibe method
for ii=1:ntest
[x22,flag,relres,iter,resvec] = cgs(A,x2);
end
toc
Elapsed time is 11.631365 seconds
x3=b;
dA = decomposition(A);
tic
for ii=1:ntest
x33 = dA \ x3;
end
toc
Elapsed time is 0.888435 seconds.
It seems that amount of time reduced by decomposition()-function when A is already lower traingular (& probably even when A is upper traiangular) seems very minor when compared with '\'.
And when we solve a slightly different problem ( a loop of x2=A\x2 instead of x22=A\x2 (code and example below)),decomposition and cgs interative method seem even less effective than '\'.
A = sparse(eye(10000));
b = randn(length(A),1);
A(:,1)=A(:,1)+b;A(1,1)=1;
A(:,2)=A(:,2)+b;A(2,2)=1;A(1,2)=0;
A(:,3)=A(:,3)+b;A(3,3)=1;A(1,3)=0;A(2,3)=0;
ntest = 10000;
x1=b;
tic
for ii=1:ntest
x1 = A \ x1;
end
toc
Elapsed time is 0.876455 seconds.
x2=b;
tic % iteratibe method
for ii=1:ntest
[x2,flag,relres,iter,resvec] = cgs(A,x2);
end
toc
Elapsed time is 5.783371 seconds.
x3=b;
dA = decomposition(A);
tic
for ii=1:ntest
x3 = dA \ x3;
end
toc
Elapsed time is 0.884621 seconds.
It seems Matlab '\' operator more powerful in itself than with any additional commands in the above examples.
A = sparse(eye(10000));
b = randn(length(A),1);
A(:,1)=A(:,1)+b;A(1,1)=1;
A(:,2)=A(:,2)+b;A(2,2)=1;A(1,2)=0;
A(:,3)=A(:,3)+b;A(3,3)=1;A(1,3)=0;A(2,3)=0;
Such matrix is special, you can just the solution by splitting in blocks of
A(1:3,1:3), A(4:end,1:3) and A(4:end,4:end)
Solutions are reduce to 3x3 inversion.
Again if you are not willing to tell us more about your matrix, we can only guess the best way to make the resolution. This can take forever.

Sign in to comment.

James Tursa
James Tursa on 26 Nov 2018
Edited: James Tursa on 26 Nov 2018

1 Comment

pb
pb on 27 Nov 2018
Edited: pb on 27 Nov 2018
Thank you for the answer James. I will try it.

Sign in to comment.

Categories

Products

Tags

Asked:

pb
on 23 Nov 2018

Commented:

pb
on 28 Nov 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!