matrix inversion doesn't fully utilize cpu
Show older comments
Hi all,
I have a 16 core CPU that I've been trying to run some FDM CFD code on. The algorithm involves 3 matrix inversions which take the most time, and I have already decomposed them, since we only need to make the matrices once. They are also sparse matrices. I assume the code should parallelize the inversion efficiently, but when I run the code, the CPU is never fully utilized, and what's surprising is that if I increase the number of grid points (make the matrix larger), the CPU utilization actually goes down. There seems to be a specific size of a matrix that is the most efficient to invert.
For example, here are some data points.
20000x20000 matrix - 30% CPU util
10000x10000 matrix - 37% CPU util
5000x5000 matrix - 45% CPU util
2211x2211 matrix- 67% CPU util
1250x1250 matrix - 17% CPU util
800x800 matrix - 22% CPU util
Is this normal, or is there something different I should be doing? I want to note that although there is less utilization at smaller matrices, it is still overall faster than cases with larger matrices. However, I would like my computer to utilize its full potential and run at full utilization for all sized matrices. Thanks!
Best,
Brandon
Answers (1)
Walter Roberson
on 17 Oct 2022
0 votes
They are also sparse matrices. I assume the code should parallelize the inversion efficiently
No, in general the inverse of a sparse matrix is a dense matrix, so processing a sparse matrix adds overhead and prevents passing regular subsections of the matrix to different cores for processing.
In most cases you should be using the \ operator instead of multiplying by a matrix inverse. You can decompose and use that object with the \ operator without forming the inverse.
4 Comments
Brandon Li
on 17 Oct 2022
Walter Roberson
on 17 Oct 2022
Are your sparse matrices band-limited, such as being tri-diagonal? There are possibly special routines for that. But in general, sparse matrices are only paralleizable for element-wise operations such as .* or .^ or + or exp(); in some cases they can be faster for * . But for more complicated operations such as \ using sparse cannot readily be done in parallel, since you cannot do efficient block operations when you have to make a decision about whether each element is included in the block or not.
There might possibly be some ways of reducing overheads on recursive block operations... but in a lot of cases, if you can afford the memory, you can get significant speed-ups by working with full() of the matrices.
Walter Roberson
on 18 Oct 2022
Edited: Walter Roberson
on 18 Oct 2022
It is sort of like the situation with
f = @(x) some long expression involving a division by x
for K = 1 : N
if X(K) == 0
Y(K) = 0;
else
Y(K) = f(X(K));
end
end
compared to
f = @(x) some long expression involving a division by x
Y = f(X);
Y(X == 0) = 0;
in the sense that if you have to test each location to see whether it is eligible for a mathematically correct result, that that can take a lot more time than simply going ahead and fixing up the results afterwards. Likewise, the process of deciding whether each element in a sparse matrix is present can end up being more costly then having used a full matrix in the first place (if you can afford the memory.)
Brandon Li
on 18 Oct 2022
Categories
Find more on Creating and Concatenating Matrices in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!