matrix inversion doesn't fully utilize cpu

Question

0 votes

Hi all,

I have a 16 core CPU that I've been trying to run some FDM CFD code on. The algorithm involves 3 matrix inversions which take the most time, and I have already decomposed them, since we only need to make the matrices once. They are also sparse matrices. I assume the code should parallelize the inversion efficiently, but when I run the code, the CPU is never fully utilized, and what's surprising is that if I increase the number of grid points (make the matrix larger), the CPU utilization actually goes down. There seems to be a specific size of a matrix that is the most efficient to invert.

For example, here are some data points.

20000x20000 matrix - 30% CPU util

10000x10000 matrix - 37% CPU util

5000x5000 matrix - 45% CPU util

2211x2211 matrix- 67% CPU util

1250x1250 matrix - 17% CPU util

800x800 matrix - 22% CPU util

Is this normal, or is there something different I should be doing? I want to note that although there is less utilization at smaller matrices, it is still overall faster than cases with larger matrices. However, I would like my computer to utilize its full potential and run at full utilization for all sized matrices. Thanks!

Best,

Brandon

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Walter Roberson on 17 Oct 2022

0 votes

They are also sparse matrices. I assume the code should parallelize the inversion efficiently

No, in general the inverse of a sparse matrix is a dense matrix, so processing a sparse matrix adds overhead and prevents passing regular subsections of the matrix to different cores for processing.

In most cases you should be using the \ operator instead of multiplying by a matrix inverse. You can decompose and use that object with the \ operator without forming the inverse.

4 Comments
Show 2 older comments Hide 2 older comments

Walter Roberson on 18 Oct 2022

Edited: Walter Roberson on 18 Oct 2022

Open in MATLAB Online

It is sort of like the situation with

f = @(x) some long expression involving a division by x
for K = 1 : N
    if X(K) == 0
        Y(K) = 0;
    else
        Y(K) = f(X(K));
    end
end

compared to

f = @(x) some long expression involving a division by x
Y = f(X);
Y(X == 0) = 0;

in the sense that if you have to test each location to see whether it is eligible for a mathematically correct result, that that can take a lot more time than simply going ahead and fixing up the results afterwards. Likewise, the process of deciding whether each element in a sparse matrix is present can end up being more costly then having used a full matrix in the first place (if you can afford the memory.)

Brandon Li on 18 Oct 2022

Yes, they are block tridiagonal. I'll try to see if there are special routines for solving systems of this form.

I tried using full() and then decomposing the matrices, but it was significantly slower. I also tried taking the inverse of the sparse matrices first, which means we have to just multiply the inverted sparse matrix with a vector, but that was also slow. The fastest I've seen is taking the decomposition of a sparse matrix and then using \. But that is not seemingly using my CPU's full utilization (unless maybe task manager is wrong?)

Thanks for all the help so far Walter!

Sign in to comment.

matrix inversion doesn't fully utilize cpu

0 Comments
Show -2 older comments Hide -2 older comments

Answers (1)

4 Comments
Show 2 older comments Hide 2 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

matrix inversion doesn't fully utilize cpu

0 Comments Show -2 older comments Hide -2 older comments

Answers (1)

4 Comments Show 2 older comments Hide 2 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

4 Comments
Show 2 older comments Hide 2 older comments