matrix inversion doesn't fully utilize cpu

Hi all,
I have a 16 core CPU that I've been trying to run some FDM CFD code on. The algorithm involves 3 matrix inversions which take the most time, and I have already decomposed them, since we only need to make the matrices once. They are also sparse matrices. I assume the code should parallelize the inversion efficiently, but when I run the code, the CPU is never fully utilized, and what's surprising is that if I increase the number of grid points (make the matrix larger), the CPU utilization actually goes down. There seems to be a specific size of a matrix that is the most efficient to invert.
For example, here are some data points.
20000x20000 matrix - 30% CPU util
10000x10000 matrix - 37% CPU util
5000x5000 matrix - 45% CPU util
2211x2211 matrix- 67% CPU util
1250x1250 matrix - 17% CPU util
800x800 matrix - 22% CPU util
Is this normal, or is there something different I should be doing? I want to note that although there is less utilization at smaller matrices, it is still overall faster than cases with larger matrices. However, I would like my computer to utilize its full potential and run at full utilization for all sized matrices. Thanks!
Best,
Brandon

Answers (1)

They are also sparse matrices. I assume the code should parallelize the inversion efficiently
No, in general the inverse of a sparse matrix is a dense matrix, so processing a sparse matrix adds overhead and prevents passing regular subsections of the matrix to different cores for processing.
In most cases you should be using the \ operator instead of multiplying by a matrix inverse. You can decompose and use that object with the \ operator without forming the inverse.

4 Comments

Actually, I have been using the \ operator - that is what I meant by matrix inversions. Sorry for being confusing. I also decomposed the matrix beforehand as well. So what else could be the problem here?
For smaller matrices, the utilization might make sense, since I would expect the bottleneck to be elsewhere, rather than the matrix inversion or \ line.
However, for larger matrices I would expect that the bottleneck is with the \ operator and that matlab would be able to fully parallelize it.
Are your sparse matrices band-limited, such as being tri-diagonal? There are possibly special routines for that. But in general, sparse matrices are only paralleizable for element-wise operations such as .* or .^ or + or exp(); in some cases they can be faster for * . But for more complicated operations such as \ using sparse cannot readily be done in parallel, since you cannot do efficient block operations when you have to make a decision about whether each element is included in the block or not.
There might possibly be some ways of reducing overheads on recursive block operations... but in a lot of cases, if you can afford the memory, you can get significant speed-ups by working with full() of the matrices.
It is sort of like the situation with
f = @(x) some long expression involving a division by x
for K = 1 : N
if X(K) == 0
Y(K) = 0;
else
Y(K) = f(X(K));
end
end
compared to
f = @(x) some long expression involving a division by x
Y = f(X);
Y(X == 0) = 0;
in the sense that if you have to test each location to see whether it is eligible for a mathematically correct result, that that can take a lot more time than simply going ahead and fixing up the results afterwards. Likewise, the process of deciding whether each element in a sparse matrix is present can end up being more costly then having used a full matrix in the first place (if you can afford the memory.)
Yes, they are block tridiagonal. I'll try to see if there are special routines for solving systems of this form.
I tried using full() and then decomposing the matrices, but it was significantly slower. I also tried taking the inverse of the sparse matrices first, which means we have to just multiply the inverted sparse matrix with a vector, but that was also slow. The fastest I've seen is taking the decomposition of a sparse matrix and then using \. But that is not seemingly using my CPU's full utilization (unless maybe task manager is wrong?)
Thanks for all the help so far Walter!

Sign in to comment.

Categories

Products

Release

R2022a

Asked:

on 17 Oct 2022

Commented:

on 18 Oct 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!