How do I use arrayfun on GPU when the size of the output array doesn't equal the size (of some) input arrays (Error using gpuArray/arrayfun

5 views (last 30 days)
I'm trying to evaluate a parameterized function as quickly as possible and therefore I would like to use my GPU. In my original problem I have four parameters: A, B, C, D and a function of the form y = Ax+Bx+C+D, where x is a fixed vector of length 900. Now I want to evaluate the function for different combination of my parameters A, B, C and D.
Simplified example:
x = 1:10;
a = rand(1,100);
%Using anonymous funtion (CPU)
F = @(a) a*x;
y = arrayfun(F,a,'UniformOutput',false);
This works fine and gives output y as 1x100 cell array where every cell is an 1x10 double array. This is what I want :), however it is slow.
If I want to do this on the GPU I must use nested functions else I cant pass x in the function:
---------FUNCTION IN FOLDER-------
function [ y ] = fun2(a, x)
y = arrayfun(@nestedFcn,a);
function out = nestedFcn(in)
out = in*x;
end
end
-----------------------------------
and i call this from matlab
x = 1:10;
a = rand(1,100);
gpu = gpuDevice();
y = fun2(gpuArray(a),gpuArray(x));
And I get this error: Error using gpuArray/arrayfun The size and shape for all variables must be the same. Variable 'x' differs from 'out'. Error in 'fun2' (line: 4)
This error can be solved by adding 'UniformOutput'=false but this doesn't work with gpuArrays. How can i fix this or use some workround?
  1 Comment
Jan Orwat
Jan Orwat on 2 May 2016
Yes, arrayfun requires input matrices to be the same size. It seems pagefun will do the job. You can use functions like shiftdim to manage multiple dimensions of inputs.
Please note I don't have much experience with GPU computing in MATLAB, so I may be wrong about this.

Sign in to comment.

Answers (1)

Joss Knight
Joss Knight on 3 May 2016
Edited: Joss Knight on 3 May 2016
gpuArray arrayfun is different from the CPU version, which is effectively just a syntactic convenience to avoid writing for loops. For gpuArray, the function you call with arrayfun is treated as a GPU kernel to be executed on each element of your input. This imposes two restrictions:
  1. The function can only carry out scalar operations. That rules out mtimes which you are using here to multiply A and x.
  2. The function can only output a scalar. This rules out returning a vector out and it rules out non-uniform output.
If you want to implement the CPU-like version of arrayfun then you need to use for-loops - this would be no less optimal than arrayfun is in that case, since it is just a wrapper for a loop.
The only way to implement vector operations inside gpuArray arrayfun is to implement them manually. So for instance, if you knew that x was always 10 elements long, and you wanted to output dot(a,x) (which outputs a scalar so is allowed), you could write the function
function out = nestedFcn(a)
out = 0;
for i = 1:10
out = out + a*x(i);
end
end
Of course, this would be a little odd since you can implement this using ordinary matrix maths without recourse to loops or arrayfun. In your case, you seem to want to scale x by each element of a. That's just an outer product, so forget arrayfun and just do some maths:
y = a' * x;
If you really want the output as a cell array for some reason then you could break it up again using num2cell:
y = num2cell(y, 2);
This would typically not be advisable though, since arrays with only 10 elements are not efficiently processed on the GPU.
As Jan Orwat points out, you can in fact do batch matrix multiply operations using pagefun, as long as your pages are distributed along the 3rd dimension or higher:
y = pagefun(@mtimes, reshape(a, 1, 1, []), x);
where one of your inputs must be a gpuArray. Like I say, this would still be a bit odd since ordinary matrix algebra covers your use case; but if you had a more complicated case (like multiplying many different a s by many different x s), that's where pagefun becomes essential.
  4 Comments
Joss Knight
Joss Knight on 4 May 2016
Okay, after reading your code carefully, I think I can make some educated guesses. combvec is 4 x PS^4, and you meant to write C(1,i) rather than C(1) etc.
Still, you just don't need to use pagefun, just a couple of uses of bsxfun to handle dimension expansion:
p1v1 = C(1,:)'*v1; % p1*v1 -> 65536 x 900
p2mv2 = bsxfun(@minus, C(2,:)', v2); % p2-v2 -> 65536 x 900
p3v1 = C(3,:)'*v1; % p1*v1 -> 65536 x 900
p4mv3 = bsxfun(@minus, C(4,:)', v3); % p4-v3 -> 65536 x 900
y = (p1v1 + p2mv2) ./ (p3v1 + p4mv3);
When I used this code instead of yours, my CPU got a 2x speedup and my GPU gave a further 5x speedup.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!