# Better parallelization than parfor?

1 view (last 30 days)
schlapp on 27 Jan 2020
Edited: Matt J on 28 Jan 2020
Hello,
I have a function fun(vec(1:n),Nmax) with two outputs: (i) a matrix g(1:n,1:Nmax) and (ii) a vector tau(1:Nmax), as Nmax eigenfunctions and eigenvalues of a matrix constructed within fun. Now I want to get this output for all values withing a vector vec2(1:n). The simplest way is a for-loop
for i=1:n
[g(1:n,1:Nmax,1:n),tau(1:n,1:Nmax)] = fun(vec(1:n),Nmax,vec2(i))
end
This is however slow. Replacing it with a parfor-loop
parfor i=1:n
[g(:,:,i),tau(i,:)] = fun(vec(:),Nmax,vec2(i))
end
is quite a bit faster. I was wondering if there is a way to accelerate this even further by threading over the vector vec2? Somehow I cannot find the correct way without rewriting the function fun in several ways. Maybe the problem is that g is already a matrix, therefore fun(vec,Nmax,vec2') does not evaluate things in the correct dimension?

Matt J on 27 Jan 2020
I was wondering if there is a way to accelerate this even further by threading over the vector vec2?
Not sure what you mean. The code as you've presented it already does divide vec2 into parallel pieces. Nothing in what you've shown makes inefficient use of parfor that I can see. Anything slow would be in the details of how fun is implemented.
schlapp on 27 Jan 2020
Well I am not sure if there will be any improvements, but in the help to parfun it says that vectorization is usually the fastest way to thread over a vector. Maybe to clearify my question this example helps.
If i write sin(vec+vec'), the output is a nxn matrix, where n is the length of vec. However within this notation I am limited to automatic threading over only two dimensions. And I was wondering if there was a direct way to extend this to three or more (instead of a for-loop for the third dimension). I guess, if I construct a nxnxn matrix M beforehand and call fun(M) this might be faster (if even) but this requires to adjust the code of fun in order to allow matrices as input.
The reason why I think it could be improved is that MATLAB tells me that vec in my above example is a 'broadcast' variable and might slow things down.
Matt J on 28 Jan 2020
If i write sin(vec+vec'), the output is a nxn matrix, where n is the length of vec. However within this notation I am limited to automatic threading over only two dimensions.
The extension would be
sin(vec+vec.'+ reshape(vec,1,1,[]));

Matt J on 27 Jan 2020
The reason why I think it could be improved is that MATLAB tells me that vec in my above example is a 'broadcast' variable and might slow things down.
But you said vec2 was the problem, not vec.
In any case, if each call to fun requires all of vec then you have no choice but to broadcast it.

schlapp on 28 Jan 2020
Well I was wondering if I can replace the (par-)for-loop over vec2 entirely by something similar to calling fun with the vectors in the correct dimension, e.g.like fun(vec,vec2',...).
However when I use the parfor loop, I get the message, that the loop is slow because the whole vector vec needs to be broadcasted. There is no way to evaluate fun without the whole vector vec. So I conclude that there is no simple way to improve performance than using the parfor loop, without rewriting my function fun to also handle matrix inputs.
In any case, thank you for your effort! I not very knowledgable when it comes to good coding.
Edric Ellis on 28 Jan 2020
As @Matt J pointed out, if each iteration needs all of vec, then there's not much you can do. That warning is shown for all broadcast variables, regardless of whether or not they actually cause a performance degradation.
You can use ticBytes and tocBytes to see how much data is actually being transmitted to/from your parfor loop. If vec is a numeric vector of length n, then my suspicion would be that it probably has no discernable affect on the loop time.

Philippe Lebel on 27 Jan 2020