How to speed up LU decomposition in ode15s solver?

7 views (last 30 days)
Rasoul Sheikhi on 27 Apr 2018
Edited: Rasoul Sheikhi on 30 Apr 2018
Hello, I am trying to solve a System of PDEs (time dependent and 1-d in space) using ode15s built-in solver. The Code runs quite effiiciently as I use limited number of cells for semi discritization of equations in space, say for example 200 cells. As I furthur increase the number of cells to say 900, computation time increases with a factor of 20-25. As far As I could realize most of computation time goes to the LU factorization which is employed inside ode15s . I am writing to ask if there is a way to Speed it up?:) Regards Rasoul

Bill Greene on 27 Apr 2018
You didn't provide any details of exactly how you are using ode15s but I suspect you can substantially improve the performance.
I assume you are using some kind of finite difference, finite element, or finite volume method for discretizing in space. These methods lead to a very sparse Jacobian matrix in the equations to be solved. If you are not explicitly providing this matrix to ode15s, it will calculate a dense Jacobian using a finite difference method. However, you can calculate this Jacobian yourself and provide it to ode15s using the Jacobian option.
Often, especially for nonlinear equations, it is cumbersome to explicitly calculate the Jacobian matrix. But it is relatively straightforward to predict where the non-zeros will be in the sparse Jacobian matrix (this follows directly from the connectivity of the cells in the mesh). You can provide this information to ode15s using the JPattern option. This reduces the computational cost of calculating the Jacobian by finite differences and the subsequent factorization of that Jacobian.
Rasoul Sheikhi on 29 Apr 2018
I remember once I tried to use this function but since it is a built-in function for solving elliptic and parabolic type PDEs I could not make use of it since my problem is hyperbolic type (transport equation). So, What I am doing now is discretizing the equations in space and storing all unknowns of all cells in a vector which is then used as input argument for ode15s. I hope this describes what I am doing in my code:) Do you think I can still somehow bind Jpattern with ode15s? Regards Rasoul

John D'Errico on 27 Apr 2018
Edited: John D'Errico on 27 Apr 2018
Nope. Not really. Do you suppose there are many ways to write an LU, but they purposely used the really slow version? Actually, the computations used in ODE tools are quite well written, including the LU, which will have been compiled and is highly optimized.
What you need to recognize is that these tools require the solution of a linear system of equations, and this will be done at every time step.
Next, the complexity of solving a linear system of equations grows with the cube of the size of the problem. So if you go from 200 variables, to 900, then that part of the solution will grow by a factor of
4.5^3
ans =
91.125
So it will begin to dominate the solution time, because most of the other aspects of that code will not grow as rapidly with problems size.
There is one more factor you might hope will help, and that is, could the LU be parallelized? Thus can LU be multi-threaded? Of course. But in fact, MATLAB does that already. If your computer has the ability to run multiple threads at once, so it has at least two CPUs under the hood. (A single CPU that is hyper threaded into 2 virtual CPUs does not count here, as that give you no speed boost.)
Make sure that MATLAB knows how many actual CPUs are present for your computer. On mine, that is
ans =
4
The more CPUs you have, the better speedup you will get. But as I said, unless maxNumCompThreads is set incorrectly for your computer, then you are already getting the benefit of multi-threading if and when it will help. For small problems, starting up multiple threads costs time.
I ran two two quick tests on my computer, VERY CAREFULLY watching the activity monitor on my machine, set to watch the number of CPUS that were currently active, and applied to the MATLAB process.
A = rand(200);
for n = 1:5000,B = lu(A);end
A = rand(900);
for n = 1:1000,B = lu(A);end
In both cases, MATLAB actually fired up all 4 processors on my system, but the first one was done so rapidly that I had to increase the loop counter to 5000, just to see MATLAB jump up to 400% activity for a second or so. So even the 200 variable problem was fully multi-threaded.
Essentially, this is a fact of life with problems of this sort. They grow in computational complexity with the cube of the system size. But MATLAB already does everything it can. If you want this to run faster, you could find a computer that is faster, and most importantly, has a few extra CPUs under the hood. There are 12 CPU systems out there to be bought reasonably. (My system was pretty good 6 or 7 years ago, but just average now.) If this is school work, then of course, you won't be buying a computer just to run your code faster. And again, hyper-threading does not count. I have 4 CPUs on my computer, and hyper-threading them into 8 virtual CPUs won't speed things up, if all 4 are already running flat out.
Rasoul Sheikhi on 29 Apr 2018
Hello John, Thank you for your comment. I did test the examples you provided and seems that it is using all available threads. For the following code snippet Elapsed time using 16 Threads is 4 times less than elapsed time using 1 Thread (I used maxNumCompThreads(1) to set number of computing threads to 1).
tic A = rand(900); for n = 1:1000,B = lu(A);end toc
So this means my code is already running at fastest speed possible?