How to distribute computation on GPU vector-wise?

Question

0 votes

Hi,

I am trying to accelerate a specific funtion by assigning each row of a matrix to one GPU core and have that core processing that row and returning a new matrix. Lets say my input matrix is n by m, I want the computation to be distributed on n cores, while each of the n cores returns a matrix of the size k by m. The computation applied to each row is quite complicated, but only functions supported by the GPU are required.

As I understand this, arrayfun can only be used for single element operations, not arrays. The individual elements in one row of the input matrix, however, cannot be computed individually. I think pagefun and bsxfun also won't work, because they do not support self written functions. Is there any way to proceed like this in Matlab without the need to implement the entire code in cuda?

Thanks!

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Joss Knight on 20 Apr 2017

0 votes

You can loop over and read multiple entries in an input array (as an up-value variable) inside arrayfun, but you can't loop over and assign to elements of an output array. There is no general way to do this in MATLAB code.

Your best bet is to tell us what you're trying to do and and we can how a combination of vectorized MATLAB functions and possible use of pagefun can give you what you want without you having to write custom CUDA.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Answer 2

Hans-Martin Schwab on 20 Apr 2017

Edited: Hans-Martin Schwab on 20 Apr 2017

Open in MATLAB Online

0 votes

Hi Joss,

thank you for your answer! It is actually not easy for me to explain. I will try to break it down as much as I can:

What I am trying to do is to compute a Matrix M_out that is (k)x(m) from a matrix M_in that is (n)x(m). In this computation, each of the n rows of M_in produces a matrix M_out_i that is (k)x(m). In the end M_out is the sum of all n M_out_i matrixes.

Each M_out_i matrix is computed in dependence of one row of M_in in a recursion formula. This recursion consists of a mutiplication with the vector v1 and a convolution with the vector v2 to attain the next row in M_out_i. Then, the multiplication and convolution is applied again to attain the consecutive row of M_out_i and so on.

The convolution can be processed as multiplication in the frequency domain. Hence, my code looks like this:

M_out=zeros(k,m);
for i = 1:n
   %%%to be executed independently(?): %%%%%
   v_temp = M_in(i,:);
    M_out_i = zeros(k,m);
    for j = 1:k
       v_temp = v_temp.*v1; % multiplication
       v_temp = ifft( fft(v_temp).*V2 ); % convolution
       M_out_i(j,:) = v_temp;
    end
   %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
   M_out = M_out + M_out_i; 
end

This is pretty much the function I want ot execute and of which I belive the loop over i=1:n can run in parallel and I only need to add up the resulting M_out_i matrixes in the end. But I am actually not very experienced in GPU processing yet. It is, however, clear that the inner loop j=1:k cannot be parallelized, due to its recursive nature.

I hope this is not too confusing.

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

How to distribute computation on GPU vector-wise?

0 Comments
Show -2 older comments Hide -2 older comments

Answers (2)

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments

Categories

Products

Tags

Community Treasure Hunt

How to distribute computation on GPU vector-wise?

0 Comments Show -2 older comments Hide -2 older comments

Answers (2)

0 Comments Show -2 older comments Hide -2 older comments

0 Comments Show -2 older comments Hide -2 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments

0 Comments
Show -2 older comments Hide -2 older comments