I want to solve the linear system Ax=B for x, computing x=A\B, with B having m column-vector. This is one of the operations in my function, but it is the most expensive. In order to obtain a speed up, I thought about splitting the matrix B into m columns (let's call them B_i, with i=1,2,..,m), and computing in parallel the x_i column-vectors (calling matlabpool), with the code below:
Let assume that for my purpose is not important the order in which the x_i are concatenated per column. I compared the time simulation obtained from x=A\B (with the whole B) and those obtained with the parallel code, for 3 different data sets, all with A non-singular and unsymmetric, and B sparse and having full rank: the first one has A dense with dimension n=2600 and number of columns of B m=30; the second one has A sparse, but with "enough dense" subblocks, with dimension n=10000 and m=9; the third one has A "very" sparse (like it is "diagonal" with 2 other diagonals starting from position (1,1) and ending in (n,n/4) and (n,3n/4), respectively), with n=25000 and m=300.Well, for the first 2 cases I cannot obtain a speed up in time simulation, indeed, the time of the parallel code is greater than x=A\B; for the last data set I obtained a time simulation 5-times lower than x=A\B, with a pool size of 2 and 4 GB of RAM, and 8-times lower with a pool size of 8 and 64 GB of RAM. So, I understand that the backslash operator works depending on the structure (and the size) of the matrix A (in the cases with A sparse, matlab tells me it used Unsymmetric MultiFrontal PACKage with automatic reordering, then UMFPACK's factorization and solver). My question are: 1) Why for dense matrices it's not useful to split B? 2) What are the factors that have determined the speed up for the last sparse case (which has no dense subblock)?